kba / transkribus-to-prima

Convert Transkribus PAGE-XML to standard PAGE-XML
11 stars 2 forks source link

Decode error #15

Open bertsky opened 2 years ago

bertsky commented 2 years ago

In this file, I get:

    converter = TranskribusToPrima(ET.parse(input_file), prefer_imgurl)
  File "src/lxml/etree.pyx", line 3521, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1880, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1900, in lxml.etree._parseFilelikeDocument
  File "src/lxml/parser.pxi", line 1795, in lxml.etree._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 1201, in lxml.etree._BaseParser._parseDocFromFilelike
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 721, in lxml.etree._handleParseResult
  File "src/lxml/etree.pyx", line 318, in lxml.etree._ExceptionContext._raise_if_stored
  File "src/lxml/parser.pxi", line 370, in lxml.etree._FileReaderContext.copyToBuffer
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 5: invalid continuation byte