load_conll, unicode support

Hi, loading data in conll format fails on my custom dataset with non-ascii characters. So when I read data with encoding 'utf-8' set, I get corresponding errors here:

  File "/usr/local/lib/python2.7/dist-packages/seqlearn/datasets.py", line 65, in <genexpr>
    lines = (str.split(line) for line in  f)
TypeError: descriptor 'split' requires a 'str' object but received a 'unicode'

def _conll_sequences(f, features, labels, lengths, split):
    # Divide input into blocks of empty and non-empty lines.
    lines = (str.strip(line) for line in  f)

Everything works perfectly, when I modify the last line like that:

 lines = (line.strip() for line in  f)

Is there anything that makes such fix unwanted?

larsmans / seqlearn

load_conll, unicode support #19