Open yassine-saoudi opened 6 years ago
Hi! This error is essentially saying that some file at whatever line is being imported, but it can't be understood as 'utf8'. There are many possible causes, some of which I have listed below (and hopefully other folks can contribute more).
Most likely, you may have something (like a header, a shebang, or an optional function-variable), somewhere (like in the file being opened, or the code that is opening it) forcing something to be interpreted as utf8 when it shouldn't be, or forcing something to be interpreted as something else when it ought to be utf8.
Other potential causes include...
I hope that helps!
thanks npeirson
Hi, has this been solved for you @yassine-saoudi ? I am facing the same problem and would like to know what you did regarding this.
Hi Ms Anjali Bhavan, If you are using the target language is the Arabic language, this type of error will be very common. The main causes are related to the preprocessing step of the training data, I propose to use the function "is_valid_arabic_word (word)" and I insist to eliminate the Arabic comma « ، ». best regards. ᐧ
Le jeu. 4 avr. 2019 à 16:06, Anjali Bhavan notifications@github.com a écrit :
Hi, has this been solved for you @yassine-saoudi https://github.com/yassine-saoudi ? I am facing the same problem and would like to know what you did regarding this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/GloVe/issues/114#issuecomment-479936622, or mute the thread https://github.com/notifications/unsubscribe-auth/AfW9WGJ_phJ3148Wm7xV4HfJqcfB0fHBks5vdhUSgaJpZM4TfDSn .
-- Saoudi Yassine
when I used the methods of Word Embeddings: with GloVe model for distributed word representation for Arabic Text (Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) ) but I have errors :
"TypeError: UnicodeDecodeError: 'utf8' codec can't decode byte 0xba in position 2: invalid start byte"
I tried to load the model with ignoring unicode errors, (unicode_errors='ignore') but it didn't solve the problem. can you help and orientate me to solve this error code?
regards.