hbz / lobid-gnd

UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
http://lobid.org/gnd
Eclipse Public License 2.0
24 stars 5 forks source link

Include ASCII name variants in UI search #308

Closed acka47 closed 2 years ago

acka47 commented 2 years ago

Originated today in this Twitter thread with @frederik-elwert: https://twitter.com/felwert/status/1504079048045125636

For example, when searching for "gandhari" via UI search box, results should include https://lobid.org/gnd/4669633-7.

fsteeg commented 2 years ago

We actually support that via the .ascii subfield (see https://github.com/hbz/lobid-gnd/issues/263 and 'ASCII' sample at http://lobid.org/gnd/api), e.g.: https://lobid.org/gnd/search?q=preferredName.ascii:gandhari

frederik-elwert commented 2 years ago

Yes, I just learned about that. The question was rather whether it would make sense to support this as the default search mode when entering search terms in the search box on the web interface. People who are not familiar with the data model and query syntax may not be aware that this does not work (since it does actually work on the DNB site when searching the GND).

acka47 commented 2 years ago

We actually support that via the .ascii subfield (see https://github.com/hbz/lobid-gnd/issues/263 and 'ASCII' sample at http://lobid.org/gnd/api), e.g.: https://lobid.org/gnd/search?q=preferredName.ascii:gandhari

@fsteeg Please check out the Twitter thread at https://twitter.com/felwert/status/1504079048045125636 where we have already discussed this.

We should probably not change the behaviour of the q query as the issue title suggests because this would significantly change the API's behaviour (API break). Perhaps it might make sense to add a parameter (ascii=true or so) that includes ascii name variants in the q search which we set in the lobid-gnd UI as default. What do you think, @fsteeg?

fsteeg commented 2 years ago

Perhaps it might make sense to add a parameter (ascii=true or so) that includes ascii name variants in the q search which we set in the lobid-gnd UI as default.

Yes, that sounds good.

acka47 commented 2 years ago

Ok, I updated the issue title and assigned you, @fsteeg. We'll sort out in our fortnitely planning on Monday when to implement this.

acka47 commented 2 years ago

We'll sort out in our fortnitely planning on Monday when to implement this.

We plan to implement in April.

fsteeg commented 2 years ago

Hm, looking into implementing this, I'm seeing that how this currently works is that we first do a search, and then, depending on the requested format, we return a specific response format of that search result, one of that being HTML for the UI. And that's how it should be, right? To implement this, I'd now change the way we search depending on the response format. That seems wrong.

Perhaps we should reconsider and include the ascii subfields for all requests as default?

fsteeg commented 2 years ago

Perhaps we should reconsider and include the ascii subfields for all requests as default?

I've deployed that for review on test, e.g. this would then contain the additional hits:

https://test.lobid.org/gnd/search?q=gandhari

On the other hand we'd get lots of false results (here about Münster) for a query like this:

https://test.lobid.org/gnd/search?q=munster

@acka47 What do you think?

acka47 commented 2 years ago

On the other hand we'd get lots of false results (here about Münster) for a query like this:

https://test.lobid.org/gnd/search?q=munster

This gets the same results as on production (https://lobid.org/gnd/search?q=munster) as we have already implemented german_normalization (see index config). So, with supporting ASCII folding for other languages, we bascially add more consistency to the search behaviour. Thus, +1.