Open sdm7g opened 5 years ago
👍
I'm the author and maintainer of oai-harvest and would love to see this PR merged, to avoid having to maintain a fork of the Client
class in that project 🙏
Looks good to me! Mergeable in my opinion @jascoul, would love to bring down the use of forks with patches for ppl like @bloomonkey
Adds an option in Client and BaseClient to use
recover=True
option on etree.XMLParser, so that OAI harvesting won't fail on a bad metadata payload. Default isrecover=False
, so that it doesn't change any existing behavior ( Making the conservative choice here, just in case anyone is relying on catching failures to validate feeds. )Initially, I tried parsing first with recover=False, catching errors and then retrying with recover=True, but I discovered that was unnecessary.
That will output a line like this, for example, on an unescaped ampersand:
CRITICAL:root:<string>:47:219:FATAL:PARSER:ERR_ENTITYREF_SEMICOL_MISSING: EntityRef: expecting ';'
lxml documents how that message can be changed or filtered, if for example, you want to change those 'CRITICAL' to 'WARNING' , but I didn't think that was worth adding to the base code if it's not already using logging module.