hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

Missing HT018925962 and HT018925945 #299

Closed jschnasse closed 8 years ago

jschnasse commented 8 years ago

Not ok: http://lobid.org/resource/HT018925962/about http://lobid.org/resource/HT018925945/about

ok: http://lobid.org/resources/HT018925962 http://lobid.org/resources/HT018925945

fsteeg commented 8 years ago

API 1.x is not up to date due to the quaoar cluster issues (see https://github.com/hbz/nwbib/issues/302).

Data 2.0 is on a different machine, seems to be up to date.

fsteeg commented 8 years ago

I've indexed the updates that were missing due to the cluster issues:

http://lobid.org/resource?id=HT018925962&format=full http://lobid.org/resource?id=HT018925945&format=full

Same for sources:

http://lobid.org/resource?id=HT018925962&format=source http://lobid.org/resource?id=HT018925945&format=source

Other missing resources should be present too. Assigning to @jschnasse for review.

Some notes on what I did:

To restore these, I've indexed the updates since 2016-03-26 (when first nagios warnings came) from: http://index.hbz-nrw.de/alephxml/export/update/

1) in lodmill, locally checked out into /home/fsteeg/git/lodmill:

cd /home/fsteeg/git/lodmill/lodmill-rd/doc/scripts/hbz01/ and download files here.

To process a single update:

bash -x startHbz01ToLobidResources.sh master /home/fsteeg/git/lodmill/lodmill-rd/doc/scripts/hbz01/DE-605-aleph-update-marcxchange-20160329-20160330.tar.gz lobid-resources NOALIAS quaoar2.hbz-nrw.de quaoar exact

To process multiple files and redirect output to log file:

bash -x startHbz01ToLobidResources.sh master dummy_ignore lobid-resources NOALIAS quaoar2.hbz-nrw.de quaoar exact doc/scripts/hbz01/updates.txt > 20160405-140410-master.log.startHbz01ToLobidResources.sh 2>&1

The updates.txt file contains full paths to the files, as in the single file sample above.

2) in mabxml-elasticsearch, locally checked out into /home/fsteeg/git/mabxml-elasticsearch:

cd /home/fsteeg/git/mabxml-elasticsearch/src/main/resources/input and download files here.

In Transform.java, set DIR = "/home/fsteeg/git/mabxml-elasticsearch/src/main/resources/input" (temporary: set esIndexer.setIndexname("hbz01-staging"), see https://github.com/hbz/mabxml-elasticsearch/issues/21) and run Transform.java.

acka47 commented 8 years ago

I came across entry HT001401787 where the JSON and the source don't describe the same title:

See http://lobid.org/resource?id=HT001401787&format=full vs. http://lobid.org/resource?id=HT001401787&format=source.

I don't know whether this has anything to do with this issue. If not, we need to open a new one.

fsteeg commented 8 years ago

Completely different titles? They (now) are both "Westfälische Bibliographie zur Geschichte, Landeskunde und Volkskunde". Could that have been a temporary issue? Or am I missing some detail?

acka47 commented 8 years ago

Strange. I swear these were different titles yesterday. Obviously, this was a temporary issue then.