Closed uni-manjunath-ke closed 1 year ago
Could you show the complete command you used to produce pretrained.pt?
./pruned_transducer_stateless7/export.py \ --exp-dir ./pruned_transducer_stateless7/exp \ --bpe-model data/lang_bpe_500/bpe.model \ --epoch 30 \ --avg 15
Coudl you please check if you set --use-averaged-model
to True when exporting the models?
Hi, It is by default set to True in export.py. Pls check below snippet from export.py.
parser.add_argument(
"--use-averaged-model",
type=str2bool,
default=True,
help="Whether to load averaged model. Currently it only supports "
"using --epoch. If True, it would decode with the averaged model "
"over the epoch range from `epoch-avg` (excluded) to `epoch`."
"Actually only the models with epoch number of `epoch-avg` and "
"`epoch` are loaded for averaging. ",
)
Thanks
How do you use pretrained.pt for decoding?
@uni-manjunath-ke Sorry, I cannot reproduce your findings. I get exactly the same decoding results.
Here is what I did:
./pruned_transducer_stateless7/export.py
--exp-dir ./pruned_transducer_stateless7/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 30 \
--avg 15 \
--use-averaged-model True
cd ./pruned_transducer_stateless7/exp
ln -s pretrained.pt epoch-999.pt
cd ../..
./pruned_transducer_stateless7/decode.py
--exp-dir ./pruned_transducer_stateless7/exp \
--bpe-model data/lang_bpe_500/bpe.model \
--epoch 999 \
--avg 1 \
--use-averaged-model False
After doing this, I get the same decoding results with using --epoch 30 --avg 15 --use-averaged-model True
.
Could you please show the decoding log if it still doesn't work for you?
How do you use pretrained.pt for decoding?
Hi @csukuangfj , We use exactly similar to @marcoyang1998 mentioned in https://github.com/k2-fsa/icefall/issues/1024#issuecomment-1525912776
./pruned_transducer_stateless7/export.py --exp-dir ./pruned_transducer_stateless7/exp --bpe-model data/lang_bpe_500/bpe.model --epoch 30 --avg 15
cd ./pruned_transducer_stateless7/exp ln -s pretrained.pt epoch-3015.pt cd ../.. python3 ./pruned_transducer_stateless7/decode.py \ --decoding-method $decoding_method \ --manifest-dir $manifest_dir \ --cut-set-name $expt_name \ --use-averaged-model False \ --on-the-fly-feats True \ --bpe-model $bpe_model \ --max-duration 100 \ --exp $model_dir \ --num-workers 30 \ --epoch 3015 \ --avg 1
Thanks
@uni-manjunath-ke Sorry, I cannot reproduce your findings. I get exactly the same decoding results.
Here is what I did:
./pruned_transducer_stateless7/export.py --exp-dir ./pruned_transducer_stateless7/exp \ --bpe-model data/lang_bpe_500/bpe.model \ --epoch 30 \ --avg 15 \ --use-averaged-model True cd ./pruned_transducer_stateless7/exp ln -s pretrained.pt epoch-999.pt cd ../.. ./pruned_transducer_stateless7/decode.py --exp-dir ./pruned_transducer_stateless7/exp \ --bpe-model data/lang_bpe_500/bpe.model \ --epoch 999 \ --avg 1 \ --use-averaged-model False
After doing this, I get the same decoding results with using
--epoch 30 --avg 15 --use-averaged-model True
.Could you please show the decoding log if it still doesn't work for you?
But, we again repeated the experiments and confirmed that there is a difference in the WERs as below: Method I: ./pruned_transducer_stateless7/export.py --exp-dir ./pruned_transducer_stateless7/exp --bpe-model data/lang_bpe_500/bpe.model --epoch 30 --avg 15
cd ./pruned_transducer_stateless7/exp ln -s pretrained.pt epoch-3015.pt cd ../.. python3 ./pruned_transducer_stateless7/decode.py \ --decoding-method $decoding_method \ --manifest-dir $manifest_dir \ --cut-set-name $expt_name \ --use-averaged-model False \ --on-the-fly-feats True \ --bpe-model $bpe_model \ --max-duration 100 \ --exp $model_dir \ --num-workers 30 \ --epoch 3015 \ --avg 1 This has WER of 17.72% with greedy_search. Method II: python3 ./pruned_transducer_stateless7/decode.py \ --decoding-method $decoding_method \ --manifest-dir $manifest_dir \ --cut-set-name $expt_name \ --use-averaged-model False \ --on-the-fly-feats True \ --bpe-model $bpe_model \ --max-duration 100 \ --exp $model_dir \ --num-workers 30 \ --epoch 30 \ --avg 15
Method 1 has WER of 17.72% whereas the Method 2 has WER of 16.48%, (both using greedy_search). Both the methods are using "--use-averaged-model False". Of course, if we pass "--use-averaged-model True" in Method 2, we are getting WER of 17.7%, which is same as Method 1 WER. But, we are interested in achieving lower WER similar to that of Method 2 (using Method 1).
We tried passing "--use-averaged-model True" with Method I, but it gives a error saying "3014.pt model" not found for averaging. So, Could you please suggest how to achieve lower WER (of 16.48%) using Method 1 (i.e. through export.py)
Thanks
If you want to get the same WER with method 1, you just need to export the model with --use-averaged-model False
, like you are doing in Method 2.
In general, though, you can try averaging over fewer checkpoints with --use-averaged-model True
(e.g., LibriSpeech uses 9) and see if that improves WER.
ok Thanks @desh2608 . Will check and get back.
--use-averaged-model False with export.py gives expected WER. Thanks for the suggestion @desh2608 .
However, it is little confusing that export.py has --use-averaged-model set to True by default, whereas the --use-averaged-model is set to False by default in decode.py. Is it planned to make it consistent across scripts in future releases? Thanks.
--use-averaged-model False with export.py gives expected WER. Thanks for the suggestion @desh2608 .
However, it is little confusing that export.py has --use-averaged-model set to True by default, whereas the --use-averaged-model is set to False by default in decode.py. Is it planned to make it consistent across scripts in future releases? Thanks.
I think it is True by default in both, at least for the LibriSpeech pruned_transducer_stateless7. See:
Perhaps you changed something locally.
Ya thank you very much. True that was a local edit.
Hi All and @csukuangfj , We have trained a custom English model using pruned_transducer_stateless7. We tried decoding librispeech clean test set using two methods using averaging of last 15 epoch.pt models (--avg=15, used in both cases).
Just using export.py worsens the WERs by more than 1.2 %, which is very significant. I think this is a bug, and requires immediate attention. Could you please help on this. Thanks Thanks