hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
16 stars 4 forks source link

Please create mapping for Mab 655e #99

Closed jschnasse closed 9 years ago

jschnasse commented 9 years ago

e.g.: http://lobid.org/resource/TT002234459/about

TT002234459 describes an archived web page. The url of the archived page is stored in MAB 655e which currently does not exist in the lobid data.

priority in edoweb: high

dr0i commented 9 years ago

In the data there exists "fulltextOnline" : "http://digitool.hbz-nrw.de:1801/webclient/DeliveryManager?pid=1638893&custom_att_2=simple_viewer"

Is this wrong? [edit:] Ah now I see you mean: The URL of the origin of the archived page, not the URL of the archive. Thus, http://www.rhein-lahn-info.de/jakobsweg/ is missing in the lobid data.

jschnasse commented 9 years ago

correct! Not the link to the archived resource is missing but the link to the resource that has been archived. :-)

dr0i commented 9 years ago

Till today we only took the URl in 655eu into account if the same entity had also a note about being an archived resource - see http://lobid.org/resource?id=HT014997977&format=source as an example. With the new mapping URLs in 655eu will be stored as lv#fulltextOnline even if these URLs are not tagged as beeing archives (via their 655e-subfields) but also when 652a="Archiv.." exists. It may be that this leads to wrong assumption of URLs , i.e. their qualification as lv#fullextOnline. Tests so far look good. We will see. (Honestly, dealing with "online resources" and the according fields (655e, 652 , 334 ... ) et al. is messy).

dr0i commented 9 years ago

After discussion with @jschnasse : We need a proper property from @acka47 to make the statement <A> <archives> <B>. Reusing lv#fulltextOnline is not adequate.

acka47 commented 9 years ago

I will create a class lv:ArchivedWebPage to be used with all edoweb resources as well as a property lv:webPageArchived, ok?

jschnasse commented 9 years ago

sounds reasonable! +1

acka47 commented 9 years ago

@dr0i To Do:

dr0i commented 9 years ago

Deployed to staging. @jschnasse have a look.

dr0i commented 9 years ago

ping @jschnasse

dr0i commented 9 years ago

Deployed even to production. @jschnasse have a look and rise some thumbs if it does what it should.

acka47 commented 9 years ago

What's missing: Add lv:webPageArchived to the JSON-LD context. Will do.

jschnasse commented 9 years ago

test import is running. looks good so far! Many thx.

jschnasse commented 9 years ago

+1

acka47 commented 9 years ago

We obiously overshot the mark here by typing all Edoweb resources lv:ArchivedWebPage. Only resources with a URL in 655e should be typed as such as reported by @jschnasse . That's why I re-opened this issue.

dr0i commented 9 years ago

Here I need an example resource. HT014997977 and HT018433961 are in the samllest test set and they are proper ArchivedWebPages.

acka47 commented 9 years ago

Examples: HT018585406, HT018585452, HT018585477.

See also this JIRA issue: https://jira.hbz-nrw.de/browse/EDOZWO-480.

dr0i commented 9 years ago

So even if 652 states that it's about an "archivierte online resource" it's not (necessarily). (A librarian's rose can be anything.)

dr0i commented 9 years ago

@acka47 you may be satisfied with having a look at the ntriples in the outcome of the new transformation in the smallest test set (spares us the time of deploying to staging) . Look at HT018585406. Now this resource is no more of type lv:ArchivedWebPage.

acka47 commented 9 years ago

+1

acka47 commented 9 years ago

@jschnasse just let me know that we will have to inform @literarymachine as soon as this is deployed on production.

literarymachine commented 9 years ago

Thanks! Team lunch communication. Better than any ticketing system.

acka47 commented 9 years ago

In this case, @jschnasse used the other ticketing system (JIRA) to communicate this. Lunch was good and very sunny, though.

acka47 commented 9 years ago

@dr0i The latest fix (i.e. only type resources as lv:ArchivedWebPage that have with a URL in 655e) hasn't been deployed to production yet. See http://lobid.org/resource/HT018585406 which should NOT be of type lv:ArchivedWebPage.

dr0i commented 9 years ago

In the meantime (at 20150415) the metadata of the resource has changed , i.e. e.g. http://lobid.org/resource/HT018585406 has an URL in 655e in thus is (correctly) of type lv:ArchivedWebPage. The principle mechanism coded with https://github.com/lobid/lodmill/commit/ed809ffe4ef9f6d7f500525a5f93d0953b313b93 to exclude the type lv:ArchivedWebPage if there is no 655e is also working , as the unit tests don't bring up this type when working on old HT018585406's metadata (where 655e wasn't configured).
Other resources have changed their metadata also, at least that's true for HT018585477.

dr0i commented 9 years ago

Deployed to staging and production (since yesterday with lobid/lodmill#669).

acka47 commented 9 years ago

@jschnasse Can you point me to a current example of an edoweb resource without a URL in 655e?

acka47 commented 9 years ago

Just talked to @jschnasse . This issue has gotten a bit out of hand. We decided to completely revert the addition of type v:ArchivedWebPage (i.e. no resource at all shall be typed as such) and stick to the issue title which we already have reached (i.e. adding the URL of the web page that has been archived to the RDF.

We may get the information whether something is an archived web page from MAB field 051, element 1, see e.g. http://lobid.org/resource?id=TT002234459&format=source where there's a w in element 1: <controlfield tag="051">mw||||||</controlfield>.

jschnasse commented 9 years ago

Hi, just to clarify, there are archived webpages in the catalogue and it makes sense to type them as such. IMO it seems to be a bad idea to override explicit types if no actual use case is in sight. I will have an eye on this in future issues. If the 'w' indicates a type "archivedWebpage" it would add a valuable information to the dataset. Unfortunately this type seems to be completely undocumented. In any case it would be better to open a new ticket for the type thing.

acka47 commented 9 years ago

In any case it would be better to open a new ticket for the type thing.

I did so with #152.

jschnasse commented 9 years ago

Just for the record. The current index is not sufficient for edoweb, because everything with an entry "archivierte Langzeitresource" in Mab 655e is now typed as "archivedWebpage" which is wrong and also not in accordance with our routine of title imports. Our frontend uses the types to filter search results against lobid, e.g if a user wants to import a title for a monograph, only certain types in lobid are displayed back to the user. For more information please ask @literarymachine.

dr0i commented 9 years ago

Deployed to staging. lv:ArchivedWebPage is no more. Please test.

acka47 commented 9 years ago

+1

dr0i commented 9 years ago

@jschnasse Please acknowledge if/when this should be made productive.

dr0i commented 9 years ago

With last commit the presumption "archived => cannot be of type book" is removed, which is good e.g. for http://lobid.org/resource/HT018585406 but bad for http://lobid.org/resource/TT002234459. Last one will be handled by #152.

dr0i commented 9 years ago

deployed to staging

jschnasse commented 9 years ago

looks fine!

dr0i commented 9 years ago

Deployed to production, closing.