thu-spmi / CAT

A CRF-based ASR Toolkit
Apache License 2.0
324 stars 74 forks source link

get_word_map.pl: command not found #56

Open Sar-Dar opened 2 years ago

Sar-Dar commented 2 years ago

when I run the script commonvoice/run_mc.sh in line 98 local/mozilla_train_lms.sh , one Error occurred: local/mozilla_train_lms.sh: line 63: get_word_map.pl: command not found

seems like missing the file get_word_map.pl Is my problem here? asking for your help

maxwellzh commented 2 years ago

get_word_map.pl is provided by kaldi_lm tool from kaldi in kaldi/tools/kaldi_lm/get_word_map.pl.

Please confirm the $KALDI_ROOT is properly setup in the egs/commonvoice/path.sh and you have installed the kaldi_lm in kaldi.

Sar-Dar commented 2 years ago

yes, I checked $KALDI_ROOT and kaldi_lm also get same error

dos2unix: converting file data/dict_phn/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 86 |' 'echo 403838 |' 
Dict and token FSTs compiling succeeded
dos2unix: converting file data/dict_phn_de/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn_de/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 44 |' 'echo 157556 |' 
Dict and token FSTs compiling succeeded
Not installing the kaldi_lm toolkit since it is already there.
local/mozilla_train_lms.sh: line 63: get_word_map.pl: command not found
maxwellzh commented 2 years ago

Can you try add a line

cd kaldi_lm

between line 27-29 https://github.com/thu-spmi/CAT/blob/15ed6f22b31f76f77c1349d32b824b92b1667629/egs/commonvoice/local/mozilla_train_lms.sh#L27-L29

Sar-Dar commented 2 years ago

added cd kaldi_lm in line 28 and got same result

dos2unix: converting file data/dict_phn/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 86 |' 'echo 403838 |' 
Dict and token FSTs compiling succeeded
dos2unix: converting file data/dict_phn_de/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn_de/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 44 |' 'echo 157556 |' 
Dict and token FSTs compiling succeeded
Not installing the kaldi_lm toolkit since it is already there.
local/mozilla_train_lms.sh: line 64: get_word_map.pl: command not found
maxwellzh commented 2 years ago

Then I have no idea where the error may occurred. I would re-run the whole commonvoice/run_mc.sh to check the scripts when I'm free. This would probably take a while. Once there is any progress, I will notice you. Or if you find out where the issue locates, feel free to make a PR.

Sar-Dar commented 2 years ago

I delete the kaldi-lm and re-run the script commonvoice/run_mc.sh stage2, get same result

Downloading and installing the kaldi_lm tools
kaldi_lm/
kaldi_lm/.git/
kaldi_lm/.git/hooks/
kaldi_lm/.git/hooks/commit-msg.sample
kaldi_lm/.git/hooks/pre-receive.sample
kaldi_lm/.git/hooks/pre-rebase.sample
kaldi_lm/.git/hooks/fsmonitor-watchman.sample
kaldi_lm/.git/hooks/applypatch-msg.sample
kaldi_lm/.git/hooks/pre-push.sample
kaldi_lm/.git/hooks/update.sample
kaldi_lm/.git/hooks/pre-applypatch.sample
kaldi_lm/.git/hooks/post-update.sample
kaldi_lm/.git/hooks/prepare-commit-msg.sample
kaldi_lm/.git/hooks/pre-commit.sample
kaldi_lm/.git/branches/
kaldi_lm/.git/packed-refs
kaldi_lm/.git/info/
kaldi_lm/.git/info/exclude
kaldi_lm/.git/description
kaldi_lm/.git/logs/
kaldi_lm/.git/logs/HEAD
kaldi_lm/.git/logs/refs/
kaldi_lm/.git/logs/refs/heads/
kaldi_lm/.git/logs/refs/heads/master
kaldi_lm/.git/logs/refs/remotes/
kaldi_lm/.git/logs/refs/remotes/origin/
kaldi_lm/.git/logs/refs/remotes/origin/HEAD
kaldi_lm/.git/objects/
kaldi_lm/.git/objects/info/
kaldi_lm/.git/objects/pack/
kaldi_lm/.git/objects/pack/pack-20951a8b61b88033146dcb91efa6f8630de12e27.idx
kaldi_lm/.git/objects/pack/pack-20951a8b61b88033146dcb91efa6f8630de12e27.pack
kaldi_lm/.git/HEAD
kaldi_lm/.git/index
kaldi_lm/.git/config
kaldi_lm/.git/refs/
kaldi_lm/.git/refs/heads/
kaldi_lm/.git/refs/heads/master
kaldi_lm/.git/refs/remotes/
kaldi_lm/.git/refs/remotes/origin/
kaldi_lm/.git/refs/remotes/origin/HEAD
kaldi_lm/.git/refs/tags/
kaldi_lm/merge_ngrams.cc
kaldi_lm/uniq_to_ngrams.cc
kaldi_lm/get_word_map.pl
kaldi_lm/kaldi_lm.h
kaldi_lm/train_lm.sh
kaldi_lm/optimize_alpha.pl
kaldi_lm/prune_ngrams.cc
kaldi_lm/get_ngram_counts.cc
kaldi_lm/get_raw_ngrams.cc
kaldi_lm/interpolate_ngrams.cc
kaldi_lm/map_words_in_arpa.pl
kaldi_lm/merge_ngrams_online
kaldi_lm/scale_configs.pl
kaldi_lm/Makefile
kaldi_lm/discount_ngrams.cc
kaldi_lm/prune_lm.sh
kaldi_lm/finalize_arpa.pl
kaldi_lm/compute_perplexity.cc
g++ -g -std=c++11    get_raw_ngrams.cc   -o get_raw_ngrams
g++ -g -std=c++11    uniq_to_ngrams.cc   -o uniq_to_ngrams
g++ -g -std=c++11    merge_ngrams.cc   -o merge_ngrams
g++ -g -std=c++11    discount_ngrams.cc   -o discount_ngrams
g++ -g -std=c++11    interpolate_ngrams.cc   -o interpolate_ngrams
g++ -g -std=c++11    compute_perplexity.cc   -o compute_perplexity
g++ -g -std=c++11    prune_ngrams.cc   -o prune_ngrams
Done making the kaldi_lm tools
local/mozilla_train_lms.sh: line 64: get_word_map.pl: command not found
maxwellzh commented 2 years ago

Seems the $PATH is not properly configured, can you try add the line

export PATH=$KALDI_ROOT/tools/kaldi_lm:$PATH

before line 25 https://github.com/thu-spmi/CAT/blob/15ed6f22b31f76f77c1349d32b824b92b1667629/egs/commonvoice/local/mozilla_train_lms.sh#L25

Sar-Dar commented 2 years ago

Seems the $PATH is not properly configured, can you try add the line

export PATH=$KALDI_ROOT/tools/kaldi_lm:$PATH

before line 25

https://github.com/thu-spmi/CAT/blob/15ed6f22b31f76f77c1349d32b824b92b1667629/egs/commonvoice/local/mozilla_train_lms.sh#L25

also not help

I changed line 63 to escape this error like this :

cat $dir/unigram.counts  | awk '{print $2}' | $KALDI_ROOT/tools/kaldi_lm/get_word_map.pl "<s>" "</s>" "<UNK>" > $dir/word_map  || exit 1;

but got another same Error:

dos2unix: converting file data/dict_phn/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 86 |' 'echo 403838 |' 
Dict and token FSTs compiling succeeded
dos2unix: converting file data/dict_phn_de/lexicon_raw.txt to Unix format...
dos2unix: converting file data/dict_phn_de/lexicon.txt to Unix format...
Phoneme-based dictionary preparation succeeded
fstaddselfloops 'echo 44 |' 'echo 157556 |' 
Dict and token FSTs compiling succeeded
Not installing the kaldi_lm toolkit since it is already there.
local/mozilla_train_lms.sh: line 72: train_lm.sh: command not found
maxwellzh commented 2 years ago

This is weird. I clone the repo from github and run the run_mc.sh from scratch. No such error in my testing.

截屏2022-05-07 13 36 58
  1. Can you paste the content of your path.sh file?
  2. Please try running (at CAT/egs/commonvoice/
    . ./path.sh
    command -v get_word_map.pl

    And tell me the output