kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

error bytes_output_mode was not specified #317

Closed Tortoise17 closed 3 years ago

Tortoise17 commented 3 years ago

I am facing the error like.

python generate_lm.py --input_txt ../data/text.txt --output_dir . --top_k 5000 --kenlm_bins ../kenlm/build/bin/ --arpa_order 3 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie

./generate_scorer_package --alphabet ../data/alphabet.txt --lm lm.binary --vocab vocab5000.txt --package kenlm.scorer --default_alpha 0.65 --default_beta 1.45 
2095101 unique words read from vocabulary file.
Doesn't look like a character based (Bytes Are All You Need) model.
--force_bytes_output_mode was not specified, using value infered from vocabulary contents: false

I am trying to generate model for deepspeech. If you can give me clue. Since new scorer option, I could not make proper output. Please guide.