guillaumegenthial / tf_ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Apache License 2.0
923 stars 275 forks source link

ascii encoding reading glove #21

Open benprofessionaledition opened 5 years ago

benprofessionaledition commented 5 years ago

Hey Guillame, really excellent repo! I came across a minor issue with your code on macOS using Python 3.6.1 and the most recent version of GloVe 840B 300d (as of today).

In build_glove.py, the line: with Path('glove.840B.300d.txt').open() as f: implicitly reads in the file as ASCII encoded which apparently doesn't play nice with however my stuff is set up. It can be remedied with the following code:

with open(Path('glove.840B.300d.txt'), 'rb') as f:
        for line_idx, line in enumerate(f):
            line = line.decode('utf-8')
...

Happy to submit a PR for this or else maybe you can just shove it in at your leisure. Thanks again for all your hard work

marcmelis commented 5 years ago

I had the same issue

harirajeev commented 5 years ago

with Path('glove.840B.300d.txt').open(encoding="utf-8") as f: works

guillaumegenthial commented 5 years ago

If you use python3 you should not get any issue, but yes, technically specifying the encoding manually should fix it.