hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
16 stars 4 forks source link

by-name queries yield too many results #28

Closed fsteeg closed 10 years ago

fsteeg commented 10 years ago

eg http://api.lobid.org/subject?name=Heinsberg&format=full (but true for all by-name queries)

First hit is a person from Heinsberg, not named Heinsberg, caused by usage of the same field name in different JSON objects. This is caused by all the resolved labels in the record (which we added by user request). I don't think we can solve this on the Elasticsearch query level.

We often discuss this topic (no resolution in the data we serve, only in index, etc). Also @jschnasse recently requested additional literals for works. We probably need to have an in-depth discussion of this and how to approach it @acka47 @dr0i.

The fast fix, namely removing resolved values is not an option as it would break current API usage. I think the proper solution would be the nested JSON-LD we've been discussing for a while. With that we could use specific queries like creator.preferredName. That's a bit of work though, it's essentially issue #1.

Or am I missing something and there's an easy way to fix or avoid the issue?

acka47 commented 10 years ago

Though I think we should ASAP find a solution for the whole GND API (as it is currently useless for autosuggest), I also want to point out a solution that is focused on NWBib. We also have the problem that all GND resources are auto suggested when typing in a subject in NWBib's advanced search (see https://github.com/hbz/nwbib/issues/51). We could fix this and the problem with too results for the by-name queries by building a seperate index of GND resources in NWBib.

fsteeg commented 10 years ago

Building a special GND index for NWBib would be a lot of very specialized work. I think the best solution for https://github.com/hbz/nwbib/issues/51 would be the nested JSON-LD for resources again: if the resources contained all subject, author etc. labels in a structured way, we could build general queries and restrict them on the NWBib set.

fsteeg commented 10 years ago

Another thought about this issue: we had always been resolving the creator, but the additional fields were a more recent addition, based on user feedback. See https://github.com/lobid/lodmill/issues/318 for details. I believe it was requested for bonnus, which in the meantime stopped using the API. So that would be another option: remove the additional resolutions, keep only creator. It would be a breaking API change, but on the other hand, that API addition caused a regression that we only discovered now.

We could describe the situation on the mailing list and ask if anyone is using these labels...

fsteeg commented 10 years ago

After discussion with @jschnasse it seems we originally implemented this for @edoweb. The thing for bonnus was adding literals to resources, not GND entities. @literarymachine: @jschnasse mentioned that you might actually not be using the literals in the lobid API response any more, but doing a lookup yourself against the GND. The UI also looks like this. Is this correct? Do you only search by literals for the entity itself (not linked literals like placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity)?

fsteeg commented 10 years ago

@literarymachine: after talking to @jschnasse it is my understanding that you only use the literals in the primary topic object (like its preferredName, variantNames, placeOfDeath, etc.), and not the resolved properties in the other objects (like the profession, which you fetch from the DNB yourself). Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

literarymachine commented 10 years ago

Given this, can we remove the literals for placeOfBirth, placeOfDeath, professionOrOccupation, placeOfActivity?

Correct!

acka47 commented 10 years ago

Bug was fixed more than a month ago. Closing.