Closed fabrahman closed 5 years ago
So I realized the problem only exist for python2 in the nltk lemmatizer. I resolve it by using:
import io
and changing Line 86 in /sesame/targetid.py to:
with io.open(options.raw_input, "r", encoding='utf8') as fin:
I close the issue.
Hi there, I fixed line 86 in targetid.py but now I'm getting another error with the encoding:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 8: ordinal not in range(128)
in raw_data.py on line 21:
File "/home/davide/open-sesame/sesame/targetid.py", line 88, in <module>
instances = [make_data_instance(line, i) for i,line in enumerate(fin)]
File "sesame/raw_data.py", line 21, in make_data_instance
i+1, tokenized[i], lemmatized[i], pos_tagged[i], index) for i in range(len(tokenized))]
Any idea?
Hi,
I tried using pretrained model to annotate a corpus, I first tried a small example where the sentences.txt file has only 5 sentences and it worked well. Then I switched to my own dataset which is a lot bigger, and I am getting this error in the first step when running targetid prediction:
Any suggestion?