pender / chatbot-rnn

A toy chatbot powered by deep learning and trained on data from Reddit
MIT License
900 stars 370 forks source link

Chatbot output russian chars issue #63

Open insatile opened 4 years ago

insatile commented 4 years ago

Can you tell me please, what am I doing wrong? Was training model on raw russian text file( > 4mb), formatted as in scotus file:

marfo4ka43: Ты кто вообще?) kyindarkkk: чел с соседнего офиса с дредами) marfo4ka43: ахаха kyindarkkk: xDD marfo4ka43: мы на обеде до столовки ходим, так себе прогулка But at the end, when running chatbot.py it only returns spaces, numbers and english characters: привет алешка kyindarkkk: 5
погода сегодня так себе kyindarkkk:
ты думаешь? kyindarkkk:
кто то просто нас не понял kyindarkkk: ? 3Fllus s GO 1738 The question is, how to make it train/output russian characters too

uninstallgentoo commented 4 years ago

By default open() use locale.getpreferredencoding(False). So you need to set encoding explicitly when you open the file in utils:106 io.open(input_file, mode='rt', encoding='utf-8')