kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

STT AI python file failing because of kenlm #384

Open eliso7 opened 2 years ago

eliso7 commented 2 years ago

Why is this happening? python3 generate_lm.py --input_txt data.txt --output_dir . --top_k 2 --kenlm_bins

/mnt/c/Users/eliso/speech2text/STT/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

Converting to lowercase and counting word occurrences ... | |# | 198 Elapsed Time: 0:00:00

Saving top 2 words ...

Calculating word statistics ... Your text file has 398 words in total It has 3 unique words Your top-2 words are 85.1759 percent of all words Your most common word "sentence" occurred 199 times The least common word in your top-k is "another" with 140 times The first word with 199 occurrences is "sentence" at place 0

Creating ARPA file ... === 1/5 Counting and sorting n-grams === Reading /mnt/c/Users/eliso/speech2text/STT/data/lm/lower.txt.gz ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 Traceback (most recent call last): File "generate_lm.py", line 232, in main() File "generate_lm.py", line 216, in main build_lm(args, data_lower, vocab_str) File "generate_lm.py", line 99, in build_lm subprocess.check_call(subargs) File "/usr/lib/python3.8/subprocess.py", line 364, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/mnt/c/Users/eliso/speech2text/STT/kenlm/build/bin/lmplz', '--order', '5', '--temp_prefix', '.', '--memory', '85%', '--text', './lower.txt.gz', '--arpa', './lm.arpa', '--discount_fallback', '--prune', '0', '0', '1']' died with <Signals.SIGSEGV: 11>.

kpu commented 2 years ago

Windows is only supported by other Windows users. But are you trying the latest version of the code from this repository? There was a problem earlier that caused segfaults. Also try a memory setting like 2.5G in case there's some 32-bit weirdness.