Closed ChristophEwertowski closed 7 years ago
The ISSN field has probably the same problem.
Looking at the data at http://lobid.org/resources/HT013480902?format=json I see we don't store the ISBN with dashes. Is this what we want, @acka47? @dr0i: the removal of dashes seems to happen on the Metafacture level. To get the search support for both with and without dashes I think it has to happen on the Elasticsearch level, with an analyser.
Yes, I think that is what we do in API 1.0 as well: normalize the data with metafacture and then remove dashes and whitespaces from search terms on the elasticsearch side. Right, @dr0i ?
Right. And deliberately so. Hyphens (and spaces and ...) in ISBNs (ISSNs ...) are just semantic sugar and should always be removed. Don't put sugar into your computer..
Alright, but doing it on the Metafacture side or in Elasticsearch are two options. I think we should not be doing it on the Metafacture side, but instead using analyzers (if we set up a single analyzer it's used for indexing and search automatically). If we do it on the Metafacture side, we'll have to manually remove hyphens from search queries before we send them to Elasticsearch (or am I missing something?).
Imagine an ISBN in the source-data without a hyphen(or spaces) (or "wrong" hyphen as actually Figuring out how to correctly separate a given ISBN number is complicated) - you wouldn't be able to find it when searching this ISB"Number" (sic!) (now using hyphens) relying on ES analyzer.
What I mean is to use an analyzer that removes the hyphens (or spaces), both when indexing, and when querying the field. This should allow queries with and without hyphens. Or am I missing your point?
You are right - that would also work! But I wouldn't want to remove the normalization of the ISBN on the data level.
Alright, let's try it that way. I'll open a new issue in lobid-resources. Closing this.
As acka47 mentioned ISBN-10 and ISBN-13 are only found without "-". Maybe because of the search option "ausschließen", @fsteeg ? Example; Search