samvera / questioning_authority

Question your authorities
Other
54 stars 30 forks source link

simplify Getty queries #84

Open VladimirAlexiev opened 9 years ago

VladimirAlexiev commented 9 years ago

Hi! We developed & maintain the Getty endpoint, and luc:term should only include terms (which includes pref & altLabels, minus any " (qualifier)". You can see the props that are navigated to collect FTS text here: http://vocab.getty.edu/doc/#FTS_Insert_Queries.

You write "The full text index matches on fields besides the term, so we filter to ensure the match is in the term" and do a REGEX on pref|altLabel, and then DISTINCT since there are multiple altLabels. This query is quite complex and a bit more expensive than it needs to be.

If you provide some testing examples, we'll fix the problem "matches on fields besides the term".

For AAT, you seem to want prefLabel only. I wrote in the support forum "I think that if we make an index by prefLabels only, that would resolve most problems. But is this what you need? Eg it won't find "frostbiting" aka "frostbite boating". If you want an extra index by prefLabel only, let me know (but it'll also have more languages than EN)

VladimirAlexiev commented 9 years ago

BTW excellent project, I'll add it to "Getty usage stories"

VladimirAlexiev commented 9 years ago

If you need to filter by regex, it would be faster to return 1 row per concept and use GROUP_CONCAT to put all altLabel in that row. This way you'll avoid multiple regex() checks per concepts, and DISTINCT. Eg:

SELECT ?s ?name ?bio  {
  {select ?s ?name ?bio (CONCAT(?name, ' ', GROUP_CONCAT(?alt)) as ?labels) {
              ?s a skos:Concept; luc:term "#{search}\";
                 skos:inScheme <http://vocab.getty.edu/ulan/> ;
                 gvp:prefLabelGVP [skosxl:literalForm ?name] ;
                 foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
                 skos:altLabel ?alt .
         } GROUP BY ?s ?name ?bio}
      filter(regex(?labels,"#{search}\","i"))}
jcoyne commented 9 years ago

@VladimirAlexiev Thanks so much for the feedback. I'm not currently working on questioning_authority, but I'm hoping that another of our other consortium members will be able to incorporate your suggestions.

elrayle commented 5 years ago

@geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.

VladimirAlexiev commented 5 years ago

Your query is similar to two others given in the documentation, so it's not too complex:

http://vocab.getty.edu/doc/queries/#Combination_Full-Text_and_Exact_String_Match

http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query

Still, you may want to evaluate those (especially the latter) as they may well give better results

On Mon, Mar 4, 2019, 16:28 E. Lynette Rayle notifications@github.com wrote:

@geekscruff https://github.com/geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samvera/questioning_authority/issues/84#issuecomment-469317812, or mute the thread https://github.com/notifications/unsubscribe-auth/AAguurVK4wZxPqnWVbXgx-0SF3GXF3rsks5vTUnAgaJpZM4Fmw_J .

ghost commented 5 years ago

Thanks for the feedback @VladimirAlexiev

I think the following regex-free query returns the same results, but is much simpler. Would you mind having a look and seeing if you agree? The following example uses vinchi from the alt label.

SELECT DISTINCT ?s ?name ?bio {
  ?s a skos:Concept; 
      luc:term "leonardo AND da AND vinchi"; 
      skos:inScheme ulan: ;
      gvp:prefLabelGVP [xl:literalForm ?name];
      foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
      skos:altLabel ?alt .
} order by asc(lcase(str(?name)))
VladimirAlexiev commented 5 years ago

I like that it doesn't have regex but AND gives too much freedom imho. I'd use the FTS query from http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query