hainan-xv / kaldi

This is now the official location of the Kaldi project.
http://kaldi-asr.org
Other
5 stars 0 forks source link

Minor fixes #17

Open danoneata opened 6 years ago

danoneata commented 6 years ago

Hello,

Here are some small fixes for the rnnlm-rescoring branch:

I couldn't create a pull request, because I couldn't fork your repository as I already forked the main Kaldi repository; so I'm submitting below the diff patch.

Hope this helps and thanks for making this available.

diff --git a/scripts/rnnlm/choose_features.py b/scripts/rnnlm/choose_features.py
index 1c3c195c2..dd742cb82 100755
--- a/scripts/rnnlm/choose_features.py
+++ b/scripts/rnnlm/choose_features.py
@@ -105,7 +105,7 @@ def read_vocab(vocab_file):
         wordlist[index] = word

     if wordlist[0] != '<eps>' and wordlist[0] != '<EPS>':
-        sys.exit(argv[0] + ": expected word numbered zero to be epsilon.")
+        sys.exit(sys.argv[0] + ": expected word numbered zero to be epsilon.")
     return (vocab, wordlist)

@@ -169,7 +169,7 @@ word_indexes_to_exclude = {0} # a set including only zero.
 if args.special_words != '':
     for word in args.special_words.split(','):
         if not word in vocab:
-            sys.exit(argv[0] + ": error: element {0} of --special-words option "
+            sys.exit(sys.argv[0] + ": error: element {0} of --special-words option "
                      "is not in the vocabulary file {1}".format(word, args.vocab_file))
         word_indexes_to_exclude.add(vocab[word])
         this_word_prob = unigram_probs[vocab[word]]
diff --git a/scripts/rnnlm/prepare_split_data.py b/scripts/rnnlm/prepare_split_data.py
index 0cd6c3068..9cc4f69d0 100755
--- a/scripts/rnnlm/prepare_split_data.py
+++ b/scripts/rnnlm/prepare_split_data.py
@@ -207,7 +207,7 @@ command = "utils/sym2int.pl {unk_opt} {vocab_file} <{input_file} | {awk_command}
                                            # because it has {}'s awhich would
                                            # otherwise be interpreted.
     input_file="{0}/dev.txt".format(args.text_dir),
-        output_file="{0}/dev.txt".format(args.split_dir))
+    output_file="{0}/dev.txt".format(args.split_dir))
 ret = os.system(command)
 if ret != 0:
     sys.exit(sys.argv[0] + ": command '{0}' returned with status {1}".format(
danoneata commented 6 years ago

One more small thing: on the same branch, the run_lstm_1{d,e} scripts from egs/swbd/s5c/local/rnnlm/tuning have an inconsistency between comments and code:

$ grep 50 run_lstm_1{d,e}.sh
run_lstm_1d.sh:  # hold out one in every 500 lines as dev data.
run_lstm_1d.sh:  cat $text | grep ^sw | cut -d ' ' -f2- | awk -v text_dir=$text_dir '{if(NR%50 == 0) { print >text_dir"/dev.txt"; } else {print;}}' >$text_dir/swbd.txt
run_lstm_1e.sh:  # hold out one in every 500 lines as dev data.
run_lstm_1e.sh:  cat $text | grep ^sw | cut -d ' ' -f2- | awk -v text_dir=$text_dir '{if(NR%50 == 0) { print >text_dir"/dev.txt"; } else {print;}}' >$text_dir/swbd.txt
hainan-xv commented 6 years ago

Thanks. Fixed.

BTW you do not need to fork a fork -- I think you can just first clone your fork, and do something like

git checkout -b a-branch-for-bug-fix git pull another-repository buggy-branch

then you can push the changes to the branch-for-bug-fix and create a PR there.

On Thu, Nov 9, 2017 at 8:07 AM, Dan Oneata notifications@github.com wrote:

One more small thing: on the same branch, the run_lstm_1{d,e} scripts from egs/swbd/s5c/local/rnnlm/tuning have an inconsistency between comments and code:

$ grep 50 run_lstm_1{d,e}.sh run_lstm_1d.sh: # hold out one in every 500 lines as dev data. run_lstm_1d.sh: cat $text | grep ^sw | cut -d ' ' -f2- | awk -v text_dir=$text_dir '{if(NR%50 == 0) { print >text_dir"/dev.txt"; } else {print;}}' >$text_dir/swbd.txt run_lstm_1e.sh: # hold out one in every 500 lines as dev data. run_lstm_1e.sh: cat $text | grep ^sw | cut -d ' ' -f2- | awk -v text_dir=$text_dir '{if(NR%50 == 0) { print >text_dir"/dev.txt"; } else {print;}}' >$text_dir/swbd.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hainan-xv/kaldi/issues/17#issuecomment-343149088, or mute the thread https://github.com/notifications/unsubscribe-auth/AFMCDrzvrQUGoR-rPIZUE_SSEvDQs5Urks5s0vkegaJpZM4QXmgF .

--