hbz / lobid-resources-web

MOVED TO: https://github.com/hbz/lobid-resources/tree/master/web
0 stars 0 forks source link

ISBN with hyphen aren't found #33

Closed ChristophEwertowski closed 7 years ago

ChristophEwertowski commented 7 years ago

As acka47 mentioned ISBN-10 and ISBN-13 are only found without "-". Maybe because of the search option "ausschließen", @fsteeg ? Example; Search

ChristophEwertowski commented 7 years ago

The ISSN field has probably the same problem.

fsteeg commented 7 years ago

Looking at the data at http://lobid.org/resources/HT013480902?format=json I see we don't store the ISBN with dashes. Is this what we want, @acka47? @dr0i: the removal of dashes seems to happen on the Metafacture level. To get the search support for both with and without dashes I think it has to happen on the Elasticsearch level, with an analyser.

acka47 commented 7 years ago

Yes, I think that is what we do in API 1.0 as well: normalize the data with metafacture and then remove dashes and whitespaces from search terms on the elasticsearch side. Right, @dr0i ?

dr0i commented 7 years ago

Right. And deliberately so. Hyphens (and spaces and ...) in ISBNs (ISSNs ...) are just semantic sugar and should always be removed. Don't put sugar into your computer..

fsteeg commented 7 years ago

Alright, but doing it on the Metafacture side or in Elasticsearch are two options. I think we should not be doing it on the Metafacture side, but instead using analyzers (if we set up a single analyzer it's used for indexing and search automatically). If we do it on the Metafacture side, we'll have to manually remove hyphens from search queries before we send them to Elasticsearch (or am I missing something?).

dr0i commented 7 years ago

Imagine an ISBN in the source-data without a hyphen(or spaces) (or "wrong" hyphen as actually Figuring out how to correctly separate a given ISBN number is complicated) - you wouldn't be able to find it when searching this ISB"Number" (sic!) (now using hyphens) relying on ES analyzer.

fsteeg commented 7 years ago

What I mean is to use an analyzer that removes the hyphens (or spaces), both when indexing, and when querying the field. This should allow queries with and without hyphens. Or am I missing your point?

dr0i commented 7 years ago

You are right - that would also work! But I wouldn't want to remove the normalization of the ISBN on the data level.

fsteeg commented 7 years ago

Alright, let's try it that way. I'll open a new issue in lobid-resources. Closing this.