hbz / lobid-gnd

UI and API to the Integrated Authority File (Gemeinsame Normdatei, GND)
http://lobid.org/gnd
Eclipse Public License 2.0
25 stars 5 forks source link

Add autocomplete support #6

Closed acka47 closed 6 years ago

acka47 commented 7 years ago

Copied from https://github.com/lobid/lodmill/issues/468, originally opened by @nichtich. We should consider this for the new implementation of GND lookup. (I think, @jschnasse also mentioned this.)

For simple lookup it would be nice to add format=suggest in OpenSearch Suggestions format. Format short returns a plain array of strings, e.g. http://api.lobid.org/person?name=Marx&format=short

[
    "Marx, Antônio Augusto (1919-)",
    "Marx, J. A.",
    "Marxsen, Peter Christian (1806-1869)",
    ...
]

OpenSearch Suggestions would be:

[
    "Marx",
    [
        "Marx, Antônio Augusto",
        "Marx, J. A.",
        "Marxsen, Peter Christian",
        ...
    ],
    [
        "Architekt und Künstler (1919-)"
        "Organist und Musiker in Yselstein",
        "Subrektor (1806-1869)",
        ...
    ],
    [
        "http://d-nb.info/gnd/1030092001",
        "http://d-nb.info/gnd/1012540626",
        "http://d-nb.info/gnd/1016221088",
        ...
    ],    
]
fsteeg commented 7 years ago

In resources and organisations we have a customizable suggestions format that works well with the jQuery autocomplete widget. For consistency, I think we should use the same format in authorities too.

Actual suggestions should include useful GND-specific labels, see https://github.com/hbz/lobid-gnd/issues/7#issuecomment-329107362: search?q=preferredName:Goethe+AND+variantName:Goethe&format=json:preferredName,dateOfBirth,dateOfDeath,professionOrOccupation.preferredName

The suggestions should be used in the search UI and be documented in the API page.

acka47 commented 7 years ago

+1

acka47 commented 7 years ago

In https://github.com/lobid/lodmill/issues/801#issuecomment-327780679, @jschnasse wrties how he uses the 1.x API for auto suggestion.

nichtich commented 6 years ago

See http://ws.gbv.de/suggest/gnd/ for another GND autocomplete API. We should better agree on a common API (request and response format).

Currrent request format

GBV

lobid (resources and organisations)

Current response format

GBV uses OpenSearch Suggestions format, lobid uses JSON arrays with objects having three keys label, id, category. For response with label being a plain string (depends on the query) both formats can losslessly be converted into each other. We could add a query parameter to select response format for compatibility.

fsteeg commented 6 years ago

See http://ws.gbv.de/suggest/gnd/ for another GND autocomplete API. We should better agree on a common API (request and response format).

@nichtich So the idea here would be that API users could use both APIs without code changes or specific code? Like setting up the other API as a fallback if the main API is down?

fsteeg commented 6 years ago

@acka47 Deployed to test.

API sample: https://test.lobid.org/gnd/search?q=Twain&format=json:preferredName,dateOfBirth,dateOfDeath,professionOrOccupation

Enabled for search UI: https://test.lobid.org/gnd/search

API documentation: https://test.lobid.org/gnd/api#auto-complete

acka47 commented 6 years ago

I tried it out [and edited this comment after a longer talk with @fsteeg].

  1. We need to make this an ngram search on preferredName and variantName so that I get feedback with every key I hit
  2. We will probably have to improve the ranking and disable stemming. (We'll need separate ticket(s) for this.)
  3. The behaviour when chosing an entry from the drop down list has to be improved as well. Currently, the search box is filled with a id query but it isn't send. The best thing would be to directly go to the entry page skipping the result list with one entry.
  4. Instead of ordering the suggestions by type we will serve them as ranked an unordered but with the broad type symbols we are using elsewhere.
  5. As for the API, we decided to not make the fields that are returned configurable but serving one specific string for each type. This should be very similar to what we serve in the result list (as the goal is the same: identifying the resource I am looking for) and we might reuse the result list configuration here.
fsteeg commented 6 years ago

Deployed to test, with some changes to https://github.com/hbz/lobid-gnd/issues/6#issuecomment-389775128, some of which we discussed offline:

API sample: https://test.lobid.org/gnd/search?q=Twain&format=json:suggest

Enabled for search UI: https://test.lobid.org/gnd

API documentation: https://test.lobid.org/gnd/api#auto-complete

acka47 commented 6 years ago

This is much better. Two minor things:

  1. I noticed that only the birth date is shown for persons. I think it would be good to also show the birth date like "1889-1951". (If we only show the birth date we should add a * in the beginning.)
  2. It is a bit confusing that all types are shown. I think we should only show the most general or the most specific type.
fsteeg commented 6 years ago

Deployed to test: https://test.lobid.org/gnd/search?q=Twain&format=json:suggest

I think it would be good to also show the birth date like "1889-1951"

Implemented like in the result list, with 1889- if we only have the birth date.

I think we should only show the most general or the most specific type.

Using sub types only, keeping top level types only if we have no sub types.

jschnasse commented 6 years ago

+1 - looks already very good. I will give it a try. Maybe add a ; charset=utf-8 to your content-type header.

acka47 commented 6 years ago

+1 This is really very nice now! And the ranking has already improved a lot due to the recent adjustments so that I think we don't have to change that much. The suggestions could come quicker, though, but this will hopefully be the case when we finally get the SSDs...

fsteeg commented 6 years ago

Maybe add a ; charset=utf-8 to your content-type header

Oh, interesting. I'm actually setting it, but Play 2.6 removes it, because by spec, application/json takes no charset, see note at the bottom of https://www.iana.org/assignments/media-types/application/json (clients are expected to check the content to determine which Unicode encoding is used, see section 3 in https://tools.ietf.org/html/rfc4627).

See also: https://groups.google.com/d/msg/play-framework/9FESgLrycAQ/XRCv82euBQAJ

jschnasse commented 6 years ago

wow - I'm only mentioning it because my firefox insists to use "western" as encoding which results in an "umlaut problem". I don't think it is a serious problem, it only occurs when calling format=json:suggest directly from browser.

fsteeg commented 6 years ago

my firefox insists to use "western" as encoding which results in an "umlaut problem".

Yes I also noticed broken umlauts yesterday when looking at the JSON in mobile Safari. In Firefox (60.0) it works fine for me. Maybe it's an issue with some JSON plugin you're using? So obviously there are clients that don't handle the UTF-8 default for JSON correctly, but I also think it's OK to leave it like this.

acka47 commented 6 years ago

@nichtich, we now have deployed this for lobid-gnd similarly to how we did it with the other lobid services. Re. additionally offering a standard way for auto-suggest, we can continue the discussion in #106.