nickyringland / nested_named_entities

63 stars 7 forks source link

Code throw unicode exception #2

Closed thtb closed 5 years ago

thtb commented 5 years ago

Traceback (most recent call last): File "recover_text.py", line 62, in ptb_sentences = process_ptb(args["ptb"], folder, filename) File "recover_text.py", line 22, in process_ptb for parsed_sentence in parsed_sentences: File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/util.py", line 296, in iterate_from tokens = self.read_block(self._stream) File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/api.py", line 445, in _read_parsed_sent_block return list(filter(None, [self._parse(t) for t in self._read_block(stream)])) File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/bracket_parse.py", line 61, in _read_block toks = read_regexp_block(stream, start_re=r'^(') File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/corpus/reader/util.py", line 609, in read_regexp_block line = stream.readline() File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/data.py", line 1148, in readline new_chars = self._read(readsize) File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/data.py", line 1380, in _read chars, bytes_decoded = self._incr_decode(bytes) File "/Users/marvinmu/anaconda3/lib/python3.6/site-packages/nltk/data.py", line 1411, in _incr_decode return self.decode(bytes, 'strict') UnicodeDecodeError: 'ascii' codec can't decode byte 0xd5 in position 64: ordinal not in range(128)

thtb commented 5 years ago

Treebank2 is ok