Open VladimirAlexiev opened 9 years ago
BTW excellent project, I'll add it to "Getty usage stories"
If you need to filter by regex, it would be faster to return 1 row per concept and use GROUP_CONCAT to put all altLabel in that row. This way you'll avoid multiple regex() checks per concepts, and DISTINCT. Eg:
SELECT ?s ?name ?bio {
{select ?s ?name ?bio (CONCAT(?name, ' ', GROUP_CONCAT(?alt)) as ?labels) {
?s a skos:Concept; luc:term "#{search}\";
skos:inScheme <http://vocab.getty.edu/ulan/> ;
gvp:prefLabelGVP [skosxl:literalForm ?name] ;
foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
skos:altLabel ?alt .
} GROUP BY ?s ?name ?bio}
filter(regex(?labels,"#{search}\","i"))}
@VladimirAlexiev Thanks so much for the feedback. I'm not currently working on questioning_authority, but I'm hoping that another of our other consortium members will be able to incorporate your suggestions.
@geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.
Your query is similar to two others given in the documentation, so it's not too complex:
http://vocab.getty.edu/doc/queries/#Combination_Full-Text_and_Exact_String_Match
http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query
Still, you may want to evaluate those (especially the latter) as they may well give better results
On Mon, Mar 4, 2019, 16:28 E. Lynette Rayle notifications@github.com wrote:
@geekscruff https://github.com/geekscruff Can you take a look at this issue and comment on whether or not it still applies? I know there have been changes to the Getty processing since this issue was opened.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samvera/questioning_authority/issues/84#issuecomment-469317812, or mute the thread https://github.com/notifications/unsubscribe-auth/AAguurVK4wZxPqnWVbXgx-0SF3GXF3rsks5vTUnAgaJpZM4Fmw_J .
Thanks for the feedback @VladimirAlexiev
I think the following regex-free query returns the same results, but is much simpler. Would you mind having a look and seeing if you agree? The following example uses vinchi from the alt label.
SELECT DISTINCT ?s ?name ?bio {
?s a skos:Concept;
luc:term "leonardo AND da AND vinchi";
skos:inScheme ulan: ;
gvp:prefLabelGVP [xl:literalForm ?name];
foaf:focus/gvp:biographyPreferred [schema:description ?bio] ;
skos:altLabel ?alt .
} order by asc(lcase(str(?name)))
I like that it doesn't have regex but AND gives too much freedom imho. I'd use the FTS query from http://vocab.getty.edu/doc/queries/#Exact-Match_Full_Text_Search_Query
Hi! We developed & maintain the Getty endpoint, and luc:term should only include terms (which includes pref & altLabels, minus any " (qualifier)". You can see the props that are navigated to collect FTS text here: http://vocab.getty.edu/doc/#FTS_Insert_Queries.
You write "The full text index matches on fields besides the term, so we filter to ensure the match is in the term" and do a REGEX on pref|altLabel, and then DISTINCT since there are multiple altLabels. This query is quite complex and a bit more expensive than it needs to be.
If you provide some testing examples, we'll fix the problem "matches on fields besides the term".
For AAT, you seem to want prefLabel only. I wrote in the support forum "I think that if we make an index by prefLabels only, that would resolve most problems. But is this what you need? Eg it won't find "frostbiting" aka "frostbite boating". If you want an extra index by prefLabel only, let me know (but it'll also have more languages than EN)