guillaumegenthial / tf_ner

Simple and Efficient Tensorflow implementations of NER models with tf.estimator and tf.data
Apache License 2.0
923 stars 275 forks source link

Enforce UTF-8 Encoding #26

Open mossaab0 opened 5 years ago

mossaab0 commented 5 years ago

Without proper encoding, I've got the following error:

Reading GloVe file (may take a while)
- At line 0
Traceback (most recent call last):
  File ".\build_glove.py", line 28, in <module>
    for line_idx, line in enumerate(f):
  File "C:\ProgramData\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 962: character maps to <undefined>