Open kaplun opened 6 years ago
looks like PyPDF2 is really brittle. The crash happens when trying to parse the PDF, so nothing we could easily fix. Maybe we should wrap calls to PyPDF2 in a big
try:
# call PyPDF2
except Exception as e:
# log the exception
Yeah exactly.
we wouldn't lose much anyway: texkey extraction is useful only for articles using Inspire texkeys (and maybe other platforms like ADS in the future). Those will in the vast majority of cases be produced by a standard TeX pipeline, which we know works well with PyPDF2.
Given the PDF available at: http://arxiv.org/pdf/1710.01077 refextract crashes in PyPDF2 code:
It should instead handle the exception and continue without extracting TeXKeys.