openeventdata / UniversalPetrarch

Language-agnostic political event coding using universal dependencies
MIT License
18 stars 9 forks source link

Error in preprocess_doc with Arabic text #19

Closed ahalterman closed 6 years ago

ahalterman commented 6 years ago

I'm running into an error when I try to generate the sentence-level parse for an Arabic document. Here's the error, and I'm attaching the document:

Generate sentence xml file...
/Users/ahalterman/MIT/NSF_RIDIR/UniversalPetrarch/UniversalPetrarch/data/text/syria_xml_1.xml
Traceback (most recent call last):
  File "preprocess_doc.py", line 166, in <module>
    read_doc_input(inputxml, inputparsed, outputfile)
  File "preprocess_doc.py", line 110, in read_doc_input
    if doc.encode('UTF-8').find(line) ==-1:
TypeError: a bytes-like object is required, not 'str'

syria_xml_1.txt

ahalterman commented 6 years ago

This seems to be a Python 2/3 error. Hardcoding python2 as the Python calls in preprocess_doc.sh fixed the problem. Closing, but should be addressed in #18.