Open speedplane opened 8 years ago
This is the PDF at issue that causes the problem. I fixed this bug by monkeypatching the function _add_annots
in pdfquery.py
:
def _add_annots(self, layout, annots):
"""Adds annotations to the layout object
"""
if annots:
for annot in resolve1(annots):
annot = resolve1(annot)
if annot.get('Rect') is not None:
annot['bbox'] = annot.pop('Rect') # Rename key
annot = self._set_hwxy_attrs(annot)
try:
annot['URI'] = resolve1(annot['A'])['URI']
except KeyError:
pass
rep_keys = {}
for k, v in six.iteritems(annot):
if not isinstance(v, six.string_types):
if ":" in k:
import logging
logging.warning("Converting key: %s"%k)
rep_keys[k] = k.replace(":", "_")
annot[k] = obj_to_string(v)
for keyfrom, keyto in rep_keys.items():
annot[keyto] = annot[keyfrom]
del annot[keyfrom]
elem = parser.makeelement('Annot', annot)
layout.add(elem)
return layout
thanks very much @speedplane ! your monkeypatch just worked for me, too.
in case you might be interested... i also wound up having to add this exception handler:
if annots:
for annot in resolve1(annots):
annot = resolve1(annot)
if annot.get('Rect') is not None:
try:
annot['bbox'] = annot.pop('Rect') # Rename key
annot = self._set_hwxy_attrs(annot)
except Exception as e:
print('PDFQuery._add_annots: cant form bbox?!',e,annot)
try:
annot['URI'] = resolve1(annot['A'])['URI']
except KeyError:
pass
Processing a PDF with annotations that have a colon in their key value gives an exception: