kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.25k stars 5.32k forks source link

cuda decoder write lattices with word_determinize=false and write_compact=false (for semi-supervised decoding) #4714

Open lalimili6 opened 2 years ago

lalimili6 commented 2 years ago

I want to use the Cuda decoder ( batched-wav-nnet3-cuda2) in Kaldi for semi-supervised(semiup) decoding. I wrote two scripts the first one used the Cuda decoder and then added "lattice-determinize-phone-pruned" to that script to use lattices for semiup decoding like semiup decoding in this script.

gpu_decode2.sh.txt gpu_decode2_semiup.sh.txt

1- I get this Error for semiup_gpu decoding: ERROR (lattice-determinize-phone-pruned[5.5.0~1539-ea2b]:LatticeStateTimes():lattice-functions.cc:81) Input lattice must be topologically sorted. I added TopSortLatticeIfNeeded(&lat); to this script and Error fix.

2- I computed WER for CPU, Cuda, and semiup_script. the WERs are almost the same in CPU and Cuda decoding but when adding "lattice-determinize-phone-pruned" to Cuda decoding WER gets worse.

RESULTS: deode method nnet3-latgen-faster batched-wav-nnet3-cuda2 (gpu_decode2.sh.txt) [nnet3-latgen-faster \  lattice-determinize-phone-pruned ](https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet3/decode_semisup.sh#L111) batched-wav-nnet3-cuda2 \  lattice-determinize-phone-pruned (gpu_decode2_semiup.sh.txt)
wer 37.89 37.68 37.20 40.71

In Cuda decder with "attice-determinize-phone-pruned" (gpu_decode2_semiup.sh) I set beam=25.0, lattice_beam=15.0, beam_determinize=8.0 however If I set them like cuda decoder (beam=15.0, lattice_beam=6.0) the result is 55% wer.

3- I also have a test set containing 1 hour of silence. It means I decode 1 hour of silence utterances and If it decoded a word, It has an error. The results get extremely worse! I don't know why!. deode method nnet3-latgen-faster batched-wav-nnet3-cuda2 [nnet3-latgen-faster \  lattice-determinize-phone-pruned ](https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/steps/nnet3/decode_semisup.sh#L111) batched-wav-nnet3-cuda2 \  lattice-determinize-phone-pruned
silene error 11.33 7.96 12.21 85.20

It means if I decode a silene wave with "batched-wav-nnet3-cuda2 | lattice-determinize-phone-pruned", It decodes always a word!!!! how can fix it? Does my script wrong to use Cuda decoder and lattice-determinize-phone-pruned?

best regards

danpovey commented 2 years ago

This likely has something to do with the acoustic or LM scale options. Show the full command line that you used.

lalimili6 commented 2 years ago

Here is log file: steps/nnet3/decode_semisup.sh

# nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_test_hires//ivector_online.scp --online-ivector-period=10 --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --word-determinize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=model/graph_test//words.txt --determinize-lattice=false model_online/final.mdl model/graph_test//HCLG.fst "ark,s,cs:apply-cmvn  --utt2spk=ark:data/test_hires/split4/1/utt2spk scp:data/test_hires/split4/1/cmvn.scp scp:data/test_hires/split4/1/feats.scp ark:- |" "ark:| lattice-determinize-phone-pruned --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- | lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- | gzip -c >model_online/decode__test_semiup/lat.1.gz" 
# Started at Sat Mar 12 08:58:58 UTC 2022
#
nnet3-latgen-faster --online-ivectors=scp:exp/nnet3/ivectors_test_hires//ivector_online.scp --online-ivector-period=10 --frame-subsampling-factor=3 --frames-per-chunk=50 --extra-left-context=0 --extra-right-context=0 --extra-left-context-initial=-1 --extra-right-context-final=-1 --minimize=false --word-determinize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=model/graph_test//words.txt --determinize-lattice=false model_online/final.mdl model/graph_test//HCLG.fst 'ark,s,cs:apply-cmvn  --utt2spk=ark:data/test_hires/split4/1/utt2spk scp:data/test_hires/split4/1/cmvn.scp scp:data/test_hires/split4/1/feats.scp ark:- |' 'ark:| lattice-determinize-phone-pruned --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- | lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- | gzip -c >model_online/decode__test_semiup/lat.1.gz' 
LOG (nnet3-latgen-faster[5.5.1002~1546-4609e]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (nnet3-latgen-faster[5.5.1002~1546-4609e]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
lattice-determinize-phone-pruned --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- 
lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- 
apply-cmvn --utt2spk=ark:data/test_hires/split4/1/utt2spk scp:data/test_hires/split4/1/cmvn.scp scp:data/test_hires/split4/1/feats.scp ark:- 

gpu_decode.sh

 batched-wav-nnet3-cuda2 --write-lattice=true --frames-per-chunk=140 --extra-left-context-initial=0 --frame-subsampling-factor=3 --config=model_online/conf/online.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=model/graph_test/words.txt model_online/final.mdl model/graph_test/HCLG.fst "ark,s,cs:extract-segments scp,p:data/test/split1/1/wav.scp data/test/split1/1/segments ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >model_online/decode_test_gpu/lat.1.gz" 
# Started at Wed Mar 16 15:24:21 UTC 2022
#
batched-wav-nnet3-cuda2 --write-lattice=true --frames-per-chunk=140 --extra-left-context-initial=0 --frame-subsampling-factor=3 --config=model_online/conf/online.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=model/graph_test/words.txt model_online/final.mdl model/graph_test/HCLG.fst 'ark,s,cs:extract-segments scp,p:data/test/split1/1/wav.scp data/test/split1/1/segments ark:- |' 'ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >model_online/decode_test_gpu/lat.1.gz' 
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuId():cu-device.cc:238) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda2[5.5]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: GeForce RTX 3090 free:23424M, used:843M, total:24268M, free/total:0.965232 version 8.6
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
lattice-scale --acoustic-scale=10.0 ark:- ark:- 
LOG (batched-wav-nnet3-cuda2[5.5]:CheckAndFixConfigs():nnet3/nnet-am-decodable-simple.h:123) Increasing --frames-per-chunk from 140 to 141 to make it a multiple of --frame-subsampling-factor=3

decode_gpu_semiup.sh

# batched-wav-nnet3-cuda2 --write-lattice=true --minimize=false --word-determinize=false --frames-per-chunk=141 --extra-left-context-initial=0 --frame-subsampling-factor=3 --config=model_online/conf/online.conf --max-active=7000 --beam=25.0 --lattice-beam=15.0 --acoustic-scale=1.0 --word-symbol-table=model/graph_test/words.txt --determinize-lattice=false model_online/final.mdl model/graph_test/HCLG.fst "ark,s,cs:extract-segments scp,p:data/test/split1/1/wav.scp data/test/split1/1/segments ark:- |" "ark:| lattice-determinize-phone-pruned  --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- | lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- | gzip -c >model_online/decode_test_gpu_semiup/lat.1.gz"    

# Started at Wed Mar 16 15:03:11 UTC 2022
#
batched-wav-nnet3-cuda2 --write-lattice=true --minimize=false --word-determinize=false --frames-per-chunk=141 --extra-left-context-initial=0 --frame-subsampling-factor=3 --config=model_online/conf/online.conf --max-active=7000 --beam=25.0 --lattice-beam=15.0 --acoustic-scale=1.0 --word-symbol-table=model/graph_test/words.txt --determinize-lattice=false model_online/final.mdl model/graph_test/HCLG.fst 'ark,s,cs:extract-segments scp,p:data/test/split1/1/wav.scp data/test/split1/1/segments ark:- |' 'ark:| lattice-determinize-phone-pruned  --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- | lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- | gzip -c >model_online/decode_test_gpu_semiup/lat.1.gz' 
LOG (batched-wav-nnet3-cuda2[5.5]:SelectGpuId():cu-device.cc:238) CUDA setup operating under Compute Exclusive Mode.
LOG (batched-wav-nnet3-cuda2[5.5]:FinalizeActiveGpu():cu-device.cc:338) The active GPU is [0]: GeForce RTX 3090 free:23424M, used:843M, total:24268M, free/total:0.965232 version 8.6
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.
LOG (batched-wav-nnet3-cuda2[5.5]:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.
lattice-scale --acoustic-scale=10.0 --write-compact=false ark:- ark:- 
lattice-determinize-phone-pruned --beam=8.0 --acoustic-scale=1.0 --minimize=false --word-determinize=false --write-compact=false model_online/final.mdl ark:- ark:- 
kkm000 commented 2 years ago

You apply CMN in nnet3/decode_semisup.sh but not in gpu_semisup.sh (variance normalization if off, mean on by default in apply-cmvn). Is this the root cause?

lalimili6 commented 2 years ago

You apply CMN in nnet3/decode_semisup.sh but not in gpu_semisup.sh (variance normalization if off, mean on by default in apply-cmvn). Is this the root cause?

GPU decoding (batched-wav-nnet3-cuda2 with or not lattice-determinize-phone-pruned) get online conf like online2-wav-nnet3-latgen-faster and wont set aply-cmvn for feats and set cmvn in online.conf.


since GPU without lattice-determinize-phone-pruned decoding has the same results as CPU decoding, I think lattice generates in GPU decoding has a problem; because it has an error first:

ERROR (lattice-determinize-phone-pruned[5.5.0~1539-ea2b]:LatticeStateTimes():lattice-functions.cc:81) Input lattice must be topologically sorted. It means GPU lattices don't sort.

danpovey commented 2 years ago

I think lattices are supposed to be top-sorted when written, possibly the GPU decoder is not doing that, maybe as some kind of optimization. However we could perhaps add TopSortLatticeIfNeeded() to lattice-determinize-phone-pruned.cc as a work-around.

lalimili6 commented 2 years ago

Many thanks. Yes I did that and add TopSortLatticeIfNeeded() to latbin/lattice-determinize-phone-pruned.cc line 104. is it true? but results like my first post.

kkm000 commented 2 years ago

Ah, you're certainly right about the CMN with a CUDA batch decoder! /Internally, the "batch" decoder is built on top of the online decoder. Could you please run an apples-to-apples test on CPU with online2-wav-nnet3-latgen-faster instead of nnet3-latgen-faster, to make it as close to the CUDA case as possible?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.