Hi, loading data in conll format fails on my custom dataset with non-ascii characters. So when I read data with encoding 'utf-8' set, I get corresponding errors here:
File "/usr/local/lib/python2.7/dist-packages/seqlearn/datasets.py", line 65, in <genexpr>
lines = (str.split(line) for line in f)
TypeError: descriptor 'split' requires a 'str' object but received a 'unicode'
def _conll_sequences(f, features, labels, lengths, split):
# Divide input into blocks of empty and non-empty lines.
lines = (str.strip(line) for line in f)
Everything works perfectly, when I modify the last line like that:
Hi, loading data in conll format fails on my custom dataset with non-ascii characters. So when I read data with encoding 'utf-8' set, I get corresponding errors here:
Everything works perfectly, when I modify the last line like that:
Is there anything that makes such fix unwanted?