hbz / lobid-resources

Transformation, web frontend, and API for the hbz catalog as LOD
http://lobid.org/resources
Eclipse Public License 2.0
7 stars 7 forks source link

Wrong "fulltextOnline" in http://lobid.org/resource/HT019170549 #448

Closed jschnasse closed 6 years ago

jschnasse commented 7 years ago

http://lobid.org/resource/HT019170549/about

The combination of this two statements cause problems on my side

 "http://purl.org/lobid/lv#fulltextOnline" : "dx.doi.org/10.6101/AZQ/000329",
 "fulltextOnline" : "https://repository.publisso.de/resource/frl:6401327",
dr0i commented 7 years ago

Both predicates are identical (http://purl.org/lobid/lv#fulltextOnline) but they differ in the type of the object: a) it's a string b) it's a URI. In the ES index profile exists exactly one field property (of type string), not two, so that's no problem here. My guess: it's a problem residing in the json library. Another solution would be to ensure to have always http-URIs like in the API2.0, see http://lobid.org/resources/HT019170549?format=json. Saying that, why not using API2.0?

jschnasse commented 7 years ago

so that's no problem here.

If you look at the ntriple representation, it gets clearer. You will find two statements with predicate http://purl.org/dc/terms/hasVersion and two statements with predicate http://purl.org/lobid/lv#fulltextOnline . In both cases the data type at object position differs. In one case it is a String in the other case it is a URI.

Problem When converted to json this results in two different representations of the data. You will get something like

"hasVersion" ; "aString",
 "hasVersion": {
        "@id":"aUri",
        "prefLabel":"alabel"
    }

]

Elasticsearch does not like such kind of type confusions and so do not I.

So, I really think this is a serious problem and should be fixed.

Work around In the meanwhile I disconnected this single resource from lobid updates and fixed it manually.

Edit It would be interesting to know how many resources are affected by this issue. Especially in set FRL.

jschnasse commented 7 years ago

I assigned to @acka47 because of the "it is lobid v1 so we wontfix" statement. I think that must be discussed somewhere else.

acka47 commented 7 years ago

I am quite sure the underlying problem is that there actually is no valid URL in the source data in 655.u.

<datafield tag="655" ind1="-" ind2="1">
  <subfield code="u">dx.doi.org/10.6101/AZQ/000329</subfield>
  <subfield code="x">Resolving-System</subfield>
</datafield>

As the content is – according to the MAB documentation – supposed to be a URL, the simplest solution probably would be to fix this in the union catalog. @jschnasse , can't you just ask the FLR people to correct this and to use proper URLs in 655.u?

acka47 commented 7 years ago

We should probably also fix this in the morph. We'd only have to sanitize the URLs like we do in 2.0, see https://github.com/hbz/lobid-resources/blob/1d025c11f358772748d50fcf9dd18eae43f2835e/src/main/resources/morph-hbz01-to-lobid.xml#L1840-L1845. Ass igning @ChristophEwertowski to do so.

jschnasse commented 7 years ago

HT019367457 is also wrong

seeAlso and fulltextOnline are duplicated with different types

jschnasse commented 7 years ago

HT019367457 is interesting, because both URLs looks ok/valid?! We can't blame the cataloguer, right?

dr0i commented 7 years ago

Note that the star-www.giz-URL mentioned in HT019367457 appears only one time in lobid 2.0, see http://lobid.org/resources/HT019367457.json.

acka47 commented 7 years ago

@jschnasse Isn't everything ok in 2.0 and, thus, can't we close this? Or should we talk about it on Tuesday?

jschnasse commented 7 years ago

thus, can't we close this?

And not fix the bug? No that is not thinkable ;-)

acka47 commented 6 years ago

Looking at the examples, everything is ok now. Closing.