Closed kerolos closed 7 months ago
sorry, this is not planned so far.
Best Regards Jin
On Thu, 11 Apr 2024 at 18:31 Kerolos ghobrial @.***> wrote:
Is there any script available to train the latest good model Zipformer model using Bypass Temporal Classification(BTC)/Omni-temporal Classification (OTC) ( https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/WSASR) to align speech with text instead of CTC ( https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer_ctc) ?
— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/1589, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOON42F76HJTCQFPNRJQD33Y4ZQ7BAVCNFSM6AAAAABGCBFVYKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZTONBQHA3TEMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@DongjiGao any interest?
Thanks, Dr. Daniel Povey, JinZr, Hello @DongjiGao,
1- Is there any script to create lang OTC based on Phone lexicon instead of BPE (k2-fsa/icefall/tree/master/egs/librispeech/WSASR/local/prepare_otc_lang_bpe.py), since the paper showed a better performance in the Phone Based Lexicon over (Bypass Temporal Classification)? 2- Could this Method be used to clean up the errors present in the training transcripts, something similar to original Kaldi (/egs/wsj/s5/steps/cleanup/clean_and_segment_data.sh or clean_and_segment_data_nnet3.sh ) ? 3- Is there any plan to extend that to be used in ONNX (decoder with int8) ?
Thanks in advance, Kerolos
Thanks, Dr. Daniel Povey, JinZr, Hello @DongjiGao,
1- Is there any script to create lang OTC based on Phone lexicon instead of BPE (k2-fsa/icefall/tree/master/egs/librispeech/WSASR/local/prepare_otc_lang_bpe.py), since the paper showed a better performance in the Phone Based Lexicon over (Bypass Temporal Classification)? 2- Could this Method be used to clean up the errors present in the training transcripts, something similar to original Kaldi (/egs/wsj/s5/steps/cleanup/clean_and_segment_data.sh or clean_and_segment_data_nnet3.sh ) ? 3- Is there any plan to extend that to be used in ONNX (decoder with int8) ?
Thanks in advance, Kerolos
Thank you for your interest.
Dongji
Thanks for the quick response, @DongjiGao 1- It would be great if you could check the file in prepare_otc_lang.py (using the phone based lexicon ) and any related files that required to complete the training. I would really appreciate that. 2-It seems quite helpful. I'll definitely give it a try. 3- Perhaps my focusing on utilizing Zipformer could be a primary consideration, and then exporting it to ONNX. BR, Kerolos
Sorry @DongjiGao for bothering you again: 1- I would like to use Phone Based lexicon script in OTC, and then compare the cleaning from this method VS original Kaldi method (clean_and_segment_data_nnet3.sh), which i had used phone based. Thanks in advance, Dongji Gao.
Sorry @DongjiGao for bothering you again: 1- I would like to use Phone Based lexicon script in OTC, and then compare the cleaning from this method VS original Kaldi method (clean_and_segment_data_nnet3.sh), which i had used phone based. Thanks in advance, Dongji Gao.
I will submit a PR by the end of this week.
Hello @DongjiGao , I got a large WER (1best decoding) when I used phone based lexicon, feature_type=ssl and FP16 : Moreover, in the results it seems that; it is escaping or swallowing some words when decoding, for example: a) 1688-142285-0006-2603: ref=['I', "DON'T", 'THINK', 'MISTER', 'HALE', 'YOU', 'HAVE', 'DONE', 'QUITE', 'RIGHT', 'IN', 'INTRODUCING', 'SUCH', 'A', 'PERSON', 'TO', 'US', 'WITHOUT', 'TELLING', 'US', 'WHAT', 'HE', 'HAD', 'BEEN'] 1688-142285-0006-2603: hyp=['I', "DON'T", 'THINK', 'YOU', 'HAVE', 'DONE', 'WHAT', 'HE', 'HAD', 'BEEN'] b) 1688-142285-0008-2605: ref=['HIS', 'FATHER', 'DYING', 'IN', 'MISERABLE', 'CIRCUMSTANCES'] 1688-142285-0008-2605: hyp=['HIS', 'MISERABLE', 'CIRCUMSTANCES']
Phone results:
[decode_phone.py:473] {'subsampling_factor': 4, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.24.0.dev+git.4f014b1.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'Hyrican-3', 'IP address': '127.0.1.1'}, 'otc_token': '
4-05-10 13:32:00,131 INFO [decode_phone.py:410] batch 0/?, cuts processed until now is 14 2024-05-10 13:32:16,706 INFO [decode_phone.py:410] batch 100/?, cuts processed until now is 2224 2024-05-10 13:32:18,942 INFO [decode_phone.py:430] The transcripts are stored in conformer_ctc2/exp_phone/recogs-test-clean-no_rescore.txt 2024-05-10 13:32:18,994 INFO [utils.py:656] [test-clean-no_rescore] %WER 74.22% [39021 / 52576, 6 ins, 37768 del, 1247 sub ] 2024-05-10 13:32:19,130 INFO [decode_phone.py:442] Wrote detailed error stats to conformer_ctc2/exp_phone/errs-test-clean-no_rescore.txt 2024-05-10 13:32:19,133 INFO [decode_phone.py:456] For test-clean, WER of different settings are: no_rescore 74.22 best for test-clean
2024-05-10 13:32:19,673 INFO [decode_phone.py:410] batch 0/?, cuts processed until now is 18 2024-05-10 13:32:36,774 INFO [decode_phone.py:410] batch 100/?, cuts processed until now is 2612 2024-05-10 13:32:38,820 INFO [decode_phone.py:430] The transcripts are stored in conformer_ctc2/exp_phone/recogs-test-other-no_rescore.txt 2024-05-10 13:32:38,874 INFO [utils.py:656] [test-other-no_rescore] %WER 79.77% [41756 / 52343, 8 ins, 40086 del, 1662 sub ] 2024-05-10 13:32:39,019 INFO [decode_phone.py:442] Wrote detailed error stats to conformer_ctc2/exp_phone/errs-test-other-no_rescore.txt 2024-05-10 13:32:39,024 INFO [decode_phone.py:456] For test-other, WER of different settings are: no_rescore 79.77 best for test-other
BPE results:
{'subsampling_factor': 2, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.23.0.dev+git.1c2a1b5.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'hyrican-1', 'IP address': '127.0.1.1'}, 'otc_token': '▁
2024-04-14 14:21:47,136 INFO [decode.py:476] batch 0/?, cuts processed until now is 14 2024-04-14 14:22:18,915 INFO [decode.py:476] batch 100/?, cuts processed until now is 2224 2024-04-14 14:22:23,112 INFO [decode.py:496] The transcripts are stored in conformer_ctc2/exp/recogs-test-clean-no_rescore.txt 2024-04-14 14:22:23,178 INFO [utils.py:656] [test-clean-no_rescore] %WER 7.30% [3838 / 52576, 370 ins, 599 del, 2869 sub ] 2024-04-14 14:22:23,321 INFO [decode.py:508] Wrote detailed error stats to conformer_ctc2/exp/errs-test-clean-no_rescore.txt 2024-04-14 14:22:23,325 INFO [decode.py:522] For test-clean, WER of different settings are: no_rescore 7.3 best for test-clean
2024-04-14 14:22:23,962 INFO [decode.py:476] batch 0/?, cuts processed until now is 18 2024-04-14 14:22:57,948 INFO [decode.py:476] batch 100/?, cuts processed until now is 2612 2024-04-14 14:23:02,000 INFO [decode.py:496] The transcripts are stored in conformer_ctc2/exp/recogs-test-other-no_rescore.txt 2024-04-14 14:23:02,063 INFO [utils.py:656] [test-other-no_rescore] %WER 17.93% [9385 / 52343, 741 ins, 1813 del, 6831 sub ] 2024-04-14 14:23:02,200 INFO [decode.py:508] Wrote detailed error stats to conformer_ctc2/exp/errs-test-other-no_rescore.txt 2024-04-14 14:23:02,207 INFO [decode.py:522] For test-other, WER of different settings are: no_rescore 17.93 best for test-other
Hello @DongjiGao , I got a large WER (1best decoding) when I used phone based lexicon, feature_type=ssl and FP16 : Moreover, in the results it seems that; it is escaping or swallowing some words when decoding, for example: a) 1688-142285-0006-2603: ref=['I', "DON'T", 'THINK', 'MISTER', 'HALE', 'YOU', 'HAVE', 'DONE', 'QUITE', 'RIGHT', 'IN', 'INTRODUCING', 'SUCH', 'A', 'PERSON', 'TO', 'US', 'WITHOUT', 'TELLING', 'US', 'WHAT', 'HE', 'HAD', 'BEEN'] 1688-142285-0006-2603: hyp=['I', "DON'T", 'THINK', 'YOU', 'HAVE', 'DONE', 'WHAT', 'HE', 'HAD', 'BEEN'] b) 1688-142285-0008-2605: ref=['HIS', 'FATHER', 'DYING', 'IN', 'MISERABLE', 'CIRCUMSTANCES'] 1688-142285-0008-2605: hyp=['HIS', 'MISERABLE', 'CIRCUMSTANCES']
Phone results: [decode_phone.py:473] {'subsampling_factor': 4, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.24.0.dev+git.4f014b1.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'Hyrican-3', 'IP address': '127.0.1.1'}, 'otc_token': '', 'blank_bias': -4.0, 'epoch': 20, 'iter': 0, 'avg': 5, 'method': '1best', 'use_averaged_model': False, 'num_decoder_layers': 0, 'exp_dir': PosixPath('conformer_ctc2/exp_phone'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm/G_3_gram.fst.txt'), 'full_libri': False, 'mini_libri': False, 'manifest_dir': PosixPath('data/ssl'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'input_strategy': 'PrecomputedFeatures', 'train_manifest': 'librispeech_cuts_train-clean-100.jsonl.gz'}
4-05-10 13:32:00,131 INFO [decode_phone.py:410] batch 0/?, cuts processed until now is 14 2024-05-10 13:32:16,706 INFO [decode_phone.py:410] batch 100/?, cuts processed until now is 2224 2024-05-10 13:32:18,942 INFO [decode_phone.py:430] The transcripts are stored in conformer_ctc2/exp_phone/recogs-test-clean-no_rescore.txt 2024-05-10 13:32:18,994 INFO [utils.py:656] [test-clean-no_rescore] %WER 74.22% [39021 / 52576, 6 ins, 37768 del, 1247 sub ] 2024-05-10 13:32:19,130 INFO [decode_phone.py:442] Wrote detailed error stats to conformer_ctc2/exp_phone/errs-test-clean-no_rescore.txt 2024-05-10 13:32:19,133 INFO [decode_phone.py:456] For test-clean, WER of different settings are: no_rescore 74.22 best for test-clean
2024-05-10 13:32:19,673 INFO [decode_phone.py:410] batch 0/?, cuts processed until now is 18 2024-05-10 13:32:36,774 INFO [decode_phone.py:410] batch 100/?, cuts processed until now is 2612 2024-05-10 13:32:38,820 INFO [decode_phone.py:430] The transcripts are stored in conformer_ctc2/exp_phone/recogs-test-other-no_rescore.txt 2024-05-10 13:32:38,874 INFO [utils.py:656] [test-other-no_rescore] %WER 79.77% [41756 / 52343, 8 ins, 40086 del, 1662 sub ] 2024-05-10 13:32:39,019 INFO [decode_phone.py:442] Wrote detailed error stats to conformer_ctc2/exp_phone/errs-test-other-no_rescore.txt 2024-05-10 13:32:39,024 INFO [decode_phone.py:456] For test-other, WER of different settings are: no_rescore 79.77 best for test-other
BPE results: {'subsampling_factor': 2, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.23.0.dev+git.1c2a1b5.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'hyrican-1', 'IP address': '127.0.1.1'}, 'otc_token': '▁', 'blank_bias': -4.0, 'epoch': 20, 'iter': 0, 'avg': 1, 'method': '1best', 'use_averaged_model': False, 'num_decoder_layers': 0, 'exp_dir': PosixPath('conformer_ctc2/exp'), 'lang_dir': PosixPath('data/lang_bpe_200'), 'lm_dir': PosixPath('data/lm/G_3_gram.fst.txt'), 'full_libri': False, 'mini_libri': False, 'manifest_dir': PosixPath('data/ssl'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'input_strategy': 'PrecomputedFeatures', 'train_manifest': 'librispeech_cuts_train-clean-100.jsonl.gz'}
2024-04-14 14:21:47,136 INFO [decode.py:476] batch 0/?, cuts processed until now is 14 2024-04-14 14:22:18,915 INFO [decode.py:476] batch 100/?, cuts processed until now is 2224 2024-04-14 14:22:23,112 INFO [decode.py:496] The transcripts are stored in conformer_ctc2/exp/recogs-test-clean-no_rescore.txt 2024-04-14 14:22:23,178 INFO [utils.py:656] [test-clean-no_rescore] %WER 7.30% [3838 / 52576, 370 ins, 599 del, 2869 sub ] 2024-04-14 14:22:23,321 INFO [decode.py:508] Wrote detailed error stats to conformer_ctc2/exp/errs-test-clean-no_rescore.txt 2024-04-14 14:22:23,325 INFO [decode.py:522] For test-clean, WER of different settings are: no_rescore 7.3 best for test-clean
2024-04-14 14:22:23,962 INFO [decode.py:476] batch 0/?, cuts processed until now is 18 2024-04-14 14:22:57,948 INFO [decode.py:476] batch 100/?, cuts processed until now is 2612 2024-04-14 14:23:02,000 INFO [decode.py:496] The transcripts are stored in conformer_ctc2/exp/recogs-test-other-no_rescore.txt 2024-04-14 14:23:02,063 INFO [utils.py:656] [test-other-no_rescore] %WER 17.93% [9385 / 52343, 741 ins, 1813 del, 6831 sub ] 2024-04-14 14:23:02,200 INFO [decode.py:508] Wrote detailed error stats to conformer_ctc2/exp/errs-test-other-no_rescore.txt 2024-04-14 14:23:02,207 INFO [decode.py:522] For test-other, WER of different settings are: no_rescore 17.93 best for test-other
- Is there anything in the parameters training or decoding shall I change to obtain a close result as BPE ?
- have you faced a similar situation in the phone based lexicon, in which the system swallows some words ?
Please use subsampling_factor = 2 for SSL features.
Thanks @DongjiGao for your support, and sorry for bothering you again. I have changed the subsampling_factor = 2 for SSL features in training and decoding. By changing that, the total loss and otc became very close to bpe (loss[otc_loss], tot_loss[otc_loss]). However, The WER still very high it drops from 74.22 to 65.83% [34612 / 52576, 18 ins, 32137 del, 2457 sub ] for test-clean, and from 79.77 to 74.65% [39075 / 52343, 27 ins, 35505 del, 3543 sub ] for test-other. The deletion is still very high.
-The parameters used for decoding based phone lexicon:
{'subsampling_factor': 2, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.24.0.dev+git.4f014b1.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'Hyrican-3', 'IP address': '127.0.1.1'}, 'otc_token': '
The training loss in Tensorboard (BPE based lexicon White -VS- Phone based lexicon Black):
Thanks @DongjiGao for your support, and sorry for bothering you again. I have changed the subsampling_factor = 2 for SSL features in training and decoding. By changing that, the total loss and otc became very close to bpe (loss[otc_loss], tot_loss[otc_loss]). However, The WER still very high it drops from 74.22 to 65.83% [34612 / 52576, 18 ins, 32137 del, 2457 sub ] for test-clean, and from 79.77 to 74.65% [39075 / 52343, 27 ins, 35505 del, 3543 sub ] for test-other. The deletion is still very high.
-The parameters used for decoding based phone lexicon: {'subsampling_factor': 2, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.24.0.dev+git.4f014b1.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'Hyrican-3', 'IP address': '127.0.1.1'}, 'otc_token': '', 'blank_bias': -4.0, 'epoch': 20, 'iter': 0, 'avg': 5, 'method': '1best', 'use_averaged_model': False, 'num_decoder_layers': 0, 'exp_dir': PosixPath('conformer_ctc2/exp_phone'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm/G_3_gram.fst.txt'), 'full_libri': False, 'mini_libri': False, 'manifest_dir': PosixPath('data/ssl'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'input_strategy': 'PrecomputedFeatures', 'train_manifest': 'librispeech_cuts_train-clean-100.jsonl.gz'}
The training loss in Tensorboard (BPE based lexicon White -VS- Phone based lexicon Black):
Hi @kerolos,
Sorry to dredge up this old comment, but I am seeing similar issues with high deletion rates when training with OTC. Did you get to the bottom of your issue?
@tobygodwin Can you share more details?
Thanks @DongjiGao for your support, and sorry for bothering you again. I have changed the subsampling_factor = 2 for SSL features in training and decoding. By changing that, the total loss and otc became very close to bpe (loss[otc_loss], tot_loss[otc_loss]). However, The WER still very high it drops from 74.22 to 65.83% [34612 / 52576, 18 ins, 32137 del, 2457 sub ] for test-clean, and from 79.77 to 74.65% [39075 / 52343, 27 ins, 35505 del, 3543 sub ] for test-other. The deletion is still very high.
-The parameters used for decoding based phone lexicon: {'subsampling_factor': 2, 'feature_dim': 768, 'nhead': 8, 'dim_feedforward': 2048, 'encoder_dim': 512, 'num_encoder_layers': 12, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '4c05309499a08454997adf500b56dcc629e35ae5', 'k2-git-date': 'Tue Jul 25 16:23:36 2023', 'lhotse-version': '1.24.0.dev+git.4f014b1.clean', 'torch-version': '1.13.0+cu116', 'torch-cuda-available': True, 'torch-cuda-version': '11.6', 'python-version': '3.8', 'icefall-git-branch': 'first_run', 'icefall-git-sha1': 'c45e9fec-dirty', 'icefall-git-date': 'Wed Apr 3 05:26:24 2024', 'icefall-path': '/mnt/srv/data/train_am/analysisTD/icefall_kaldi/icefall', 'k2-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/k2/init.py', 'lhotse-path': '/home/ghk/miniconda3/envs/icefall-run/lib/python3.8/site-packages/lhotse/init.py', 'hostname': 'Hyrican-3', 'IP address': '127.0.1.1'}, 'otc_token': '', 'blank_bias': -4.0, 'epoch': 20, 'iter': 0, 'avg': 5, 'method': '1best', 'use_averaged_model': False, 'num_decoder_layers': 0, 'exp_dir': PosixPath('conformer_ctc2/exp_phone'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm/G_3_gram.fst.txt'), 'full_libri': False, 'mini_libri': False, 'manifest_dir': PosixPath('data/ssl'), 'max_duration': 200.0, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'input_strategy': 'PrecomputedFeatures', 'train_manifest': 'librispeech_cuts_train-clean-100.jsonl.gz'}
The training loss in Tensorboard (BPE based lexicon White -VS- Phone based lexicon Black):
Hi @kerolos,
Can you try different 'blank_bias' during decoding (e.g., -3, -2)? It looks like the current value (-4) is too small.
Dongji
Is there any script available to train the latest good model Zipformer model using Bypass Temporal Classification(BTC)/Omni-temporal Classification (OTC) (https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/WSASR) to align speech with text instead of CTC (https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/zipformer_ctc) ?