mloesch / sickle

Sickle: OAI-PMH for Humans
Other
106 stars 42 forks source link

Recovering from BadResumptionToken? #19

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hey, thanks for Sickle! It's a great tool and it's saved me a lot of time in a current project.

I've been encountering issues, though, with a repository that keeps timing out mid-harvest, or going offline. I'm not sure, but I had to make small changes to catch an exception from Requests when this happened.

Catching that exception reveals a new problem; I get sickle.oaiexceptions.BadResumptionToken: The value of the resumptionToken argument is invalid or expired mid-harvest, and the harvest aborts.

I don't have much deep-knowledge on OAI-PMH, so I don't know if I can assume some things about the protocol. For example, I don't know if it's safe to do something like:

  1. Count previously harvested items
  2. Repeat harvest under same configuration, with offset equal to (1)

What is the idiomatic way to resolve this issue, if any? Thanks!

mloesch commented 7 years ago

I get sickle.oaiexceptions.BadResumptionToken: The value of the resumptionToken argument is invalid or expired mid-harvest, and the harvest aborts.

This seems to be a problem on the OAI server side: the resumption token you got from the last response does not seem to be valid (anymore).

For example, I don't know if it's safe to do something like:

  1. Count previously harvested items
  2. Repeat harvest under same configuration, with offset equal to (1)

Unfortunately, OAI-PMH does not specify getting records using an offset. Its only mechanism for paging is the resumption token, which is controlled entirely by the server.

Some OAI servers include the offset in the resumption token though (but it might be encoded, i.e. using Base64).

I'm afraid this seems to be a server issue, not sure if you can do much about it other than reporting the invalid token to the person in charge of the OAI server.

ghost commented 7 years ago

Thanks for your help! I'll explore their implementation a little further then, and look into ways of doing smaller harvests at a time.

September 3, 2017 6:30 PM, "Mathias Loesch" wrote: I get sickle.oaiexceptions.BadResumptionToken: The value of the resumptionToken argument is invalid or expired mid-harvest, and the harvest aborts.

This seems to be a problem on the OAI server side: the resumption token you got from the last response does not seem to be valid (anymore).  

For example, I don't know if it's safe to do something like: 
* Count previously harvested items 
* Repeat harvest under same configuration, with offset equal to (1) 
Unfortunately, OAI-PMH does not specify getting records using an offset. Its only mechanism for paging is the resumption token, which is controlled entirely by the server. 

Some OAI servers include the offset in the resumption token though (but it might be encoded, i.e. using Base64). 

I'm afraid this seems to be a server issue, not sure if you can do much about it other than reporting the invalid token to the person in charge of the OAI server. 

—

You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub (https://github.com/mloesch/sickle/issues/19#issuecomment-326818904), or mute the thread (https://github.com/notifications/unsubscribe-auth/ABHR3XJDcW-dYBvS77EOJU1biovOGmsEks5seuIggaJpZM4O2Jpk).