Without huge_tree=True, lxml parsing apparently fails on certain, even slightly largish responses (apparently of more than 9.5MB).
Because also recover=True, from the viewpoint of Sickle, this happens silently. I only noticed it happening because it results in losing also the resumption token and therefore ending the crawl, upon which I started to wonder why I had way less records than I should have had.
Alternatively, if one wanted to get fancy, one might want to add the XMLParser to use as an optional parameter passed to Sickle and from then on down to the OAIResponse. This would allow people to customize for themselves what kind of XML parsing behaviour they want. For this PR however, I opted for the most simple fix.
Without
huge_tree=True
, lxml parsing apparently fails on certain, even slightly largish responses (apparently of more than 9.5MB).Because also
recover=True
, from the viewpoint of Sickle, this happens silently. I only noticed it happening because it results in losing also the resumption token and therefore ending the crawl, upon which I started to wonder why I had way less records than I should have had.Alternatively, if one wanted to get fancy, one might want to add the XMLParser to use as an optional parameter passed to Sickle and from then on down to the OAIResponse. This would allow people to customize for themselves what kind of XML parsing behaviour they want. For this PR however, I opted for the most simple fix.