Open Manjunath-mlp opened 3 weeks ago
There should be some logs telling you how to do with it. Have you followed the logs?
I am using a pretrained model to decode.I am not sure about which logs you are talking about
Would you mind posting all of the logs?
The info you give is toooo limited.
These are the args i used :
{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"),
'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50,
'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4,
'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release',
'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815',
'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean',
'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None,
'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty',
'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall',
'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/__init__.py',
'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/__init__.py',
'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1,
'use_averaged_model': False,
'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp',
'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model',
'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500',
'decoding_method': 'fast_beam_search', 'beam_size': 4,
'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2,
'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False,
'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500,
'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024',
'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384',
'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2',
'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512,
'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4,
'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False,
'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True,
'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0,
'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True,
'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True,
'spec_aug_time_warp_factor': 80, 'enable_musan': True,
'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7,
'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048,
'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None,
'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768,
'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8,
'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search',
'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
Also, would you mind sharing the command you are using? And could you tell us what steps you have done?
More details are always helpful.
Sorry ,i just clicked enter before pasting all ,here are the code blocks i am using
These are the args i used : {'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty', 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall', 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/init.py', 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/init.py', 'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 'use_averaged_model': False, 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}
model = get_transducer_model(args)
and i used librispeech cuts dataset
args1=Namespace(epoch=30, avg=1, use_averaged_model=True, exp_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/', lang_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/', decoding_method='fast_beam_search', iter=0, context_size=2, max_sym_per_frame=1, return_cuts=True, on_the_fly_feats=False, input_strategy='PrecomputedFeatures', max_duration=10, num_workers=2)
librispeech = LibriSpeechAsrDataModule(args1) test_clean_cuts = librispeech.test_clean_cuts_soft() test_other_cuts = librispeech.dev_other_cuts_soft()
test_clean_dl = librispeech.test_dataloaders(test_clean_cuts) test_other_dl = librispeech.test_dataloaders(test_other_cuts)
test_sets = ["test-clean", "test-other"] test_dl = [test_clean_dl, test_other_dl]
for i,j in enumerate(test_clean_dl): print(i,j) break
0 {'inputs': tensor([[[-1.4938e+01, -1.3318e+01, -1.3666e+01, ..., -9.2335e+00, -9.9011e+00, -1.0107e+01], [-1.4294e+01, -1.2946e+01, -1.2869e+01, ..., -9.3337e+00, -1.0197e+01, -1.0312e+01], [-1.5064e+01, -1.5173e+01, -1.5958e+01, ..., -9.8856e+00, -9.9233e+00, -1.0065e+01], ..., [-1.2570e+01, -1.2061e+01, -1.3426e+01, ..., 5.1158e+37, 9.4185e+37, 1.7210e+38], [-1.4552e+01, -1.3632e+01, -1.3024e+01, ..., 8.7674e+37, 1.6231e+38, 2.9824e+38], [-1.5214e+01, -1.5527e+01, -1.3573e+01, ..., 1.4966e+38, 2.7860e+38, inf]]]), 'supervisions': {'text': ['BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED'], 'sequence_idx': tensor([0], dtype=torch.int32), 'start_frame': tensor([0], dtype=torch.int32), 'num_frames': tensor([1608], dtype=torch.int32), 'cut': [MonoCut(id='7127-75946-0028-495', start=0, duration=16.075, channel=0, supervisions=[SupervisionSegment(id='7127-75946-0028', recording_id='7127-75946-0028', start=0.0, duration=16.075, channel=0, text='BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED', language='English', speaker='7127', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=1608, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=16.075, storage_type='lilcom_chunky', storage_path='../data/fbank/librispeech_feats_test-clean/feats-0.lca', storage_key='2337650,45819,45198,44901,10324', recording_id='None', channels=0), recording=Recording(id='7127-75946-0028', sources=[AudioSource(type='file', channels=[0], source='/grid/codes/icefall/egs/librispeech/ASR/download/LibriSpeech/test-clean/7127/75946/7127-75946-0028.flac')], sampling_rate=16000, num_samples=257200, duration=16.075, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}})]}}
feature=j["inputs"] supervisions = j["supervisions"] texts = j["supervisions"]["text"] feature_lens = supervisions["num_frames"] feature_lens += 30
import torch import math LOG_EPS = math.log(1e-10)
feature = torch.nn.functional.pad( feature, pad=(0, 0, 0, 30), value=LOG_EPS, ) encoder_out, encoder_out_lens = model.encoder(x=feature, x_lens=feature_lens)
Here for encoder_out i am getting nans
Could you share the complete file?
You can upload your code file as an attachment in the comment.
will this works? fast_beam_search.txt
Could you post a runnable PYTHON CODE FILE?
We need to know which script you are using.
By the way, I suggest that you follow the doc https://k2-fsa.github.io/icefall/model-export/export-model-state-dict.html to learn how to use pre-trained models.
thats the ipynb file i am using to run ,i am unable to attach py or ipynb file.I am trying to implement this https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L444 for stateless7 streaming model ,I am trying to see the outputs at each timestep.
I think have loaded the model dict of pretrained model pretty much the same ,you guys have implemented.For model.decoder i am able to see the model is predicting numbers .I dont know why encoder is predicting nan
I am getting nan outputs from the encoder of pruned transducer streaming model. tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], grad_fn=)
I am running on mac cpu.Any suggestions?