hbz / lobid

Linking Open Bibliographic Data
https://lobid.org/
Eclipse Public License 2.0
15 stars 4 forks source link

HT019032560 missing in API 1.0 #310

Closed acka47 closed 7 years ago

acka47 commented 7 years ago

Reported by Publisso customers. Resource was created in Aleph on 2016-07-14. See http://lobid.org/resource/HT019032560.

It is there in API 2.0, see http://lobid.org/resources/HT019032560

acka47 commented 7 years ago

Another publisso cataloguer provided a list of these hbz IDs that were catalogued on Wednesday (2016-07-12) but don't turn up in lobid 1.0:

HT019031738 HT019031912 HT019031928 HT019031942 HT019031972 HT019031989 HT019032008 HT019032032 HT019032044 HT019032054 HT019032086 HT019032117 HT019032125 HT019032143 HT019032163 HT019032171 HT019032191 HT019032210 HT019032216 HT019032237 HT019032241

dr0i commented 7 years ago

The update file was missing at the time the indexing of API1.0 was done. Most likely this is caused by hbz/mabxml-elasticsearch#29 and the rewiring of the crontab. This is what is done by mabxml-elasticsearch:

  1. at 06:30, get the update file, index it, then copy it to the location where other index routines will grab it
  2. at 07:30, do the same again (now for staging)

Because of the duration of the indexing to mabxml-elasticseach (30 minutes) the update files are copied around 7:00, which is too late because the indexing for API1.0 is done at 06:40.

The staging process took 37 minutes, resulting in a recopy of the file at 08:07. There is much redundancy in this (getting and copying the file twice). This is also prone to error, because in the time of recopying of the resource other index routines must break.

Solution:

fsteeg commented 7 years ago

Sorry, the second copy step was previously disabled for staging, I must have accidentally enabled it yesterday while merging the cron.sh script. Disabled it again.

I was also surprised by the close timing of 6:30 and 6:40, but I believe it was set up like this and worked for a while now. But maybe I'm mistaken.

dr0i commented 7 years ago

was previously disabled for staging

:)

close timing of 6:30 and 6:40

This is fair enough time for getting and copying the data. Changed the cron.sh accordingly.

acka47 commented 7 years ago

+1

dr0i commented 7 years ago

Deployed to production, using staging with extra modifications, closing.