Open YasineNifa opened 5 years ago
Hi @YasineNifa , I haven't encountered such issues with English text. Have you followed the guide exactly? I'd suggest you to pay particular attention to creating a virtual environment. And maybe this discussion on: ioerror-errno-32-broken-pipe-python be helpful?
Please note that the file bible.en.txt.bz2
should be the raw text with single sentence per line. I see that you're using a vocabulary file instead..
Yeah I followed the guide but I did not execute this cmd : bzcat vocabulary.txt.bz2 | python process.py | wc because I did not find the process.py file Yeah the vocabulary file has the same structure as bible file [raw text with single sentence per line]
but I did not execute this cmd : bzcat vocabulary.txt.bz2 | python process.py | wc because I did not find the process.py file
Oh sorry. that was a typo. fixed it! Maybe do you have the data publicly available? I can try to replicate the error..
Here is the data I am using : https://voice.mozilla.org/fr/datasets Thx for the time :)
if you want the vocabulary.txt. Here is a link where can you find it https://drive.google.com/open?id=1TJH1O5nQsXXO0tLFPRi2zmWUQK_F4wmc
Hi, do you fix this question? now I am sturggling with it
Hello Please I am following this tutorial to create my French Language model : https://github.com/kmario23/KenLM-training But when I type this cmd :
bzcat ./data_final/vocabulary.txt.bz2 | python preprocess.py | /home/innovation/kenlm/bin/lmplz -o 3 > myvocabulary.arpa
I get the following error :