Manjunath-mlp commented 3 weeks ago

I am getting nan outputs from the encoder of pruned transducer streaming model. tensor([[[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]]], grad_fn=) I am running on mac cpu.Any suggestions?

csukuangfj commented 3 weeks ago

There should be some logs telling you how to do with it. Have you followed the logs?

Manjunath-mlp commented 3 weeks ago

I am using a pretrained model to decode.I am not sure about which logs you are talking about

csukuangfj commented 3 weeks ago

Would you mind posting all of the logs?

The info you give is toooo limited.

Manjunath-mlp commented 3 weeks ago

These are the args i used :

{'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 
'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50,
 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4,
 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release',
 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 
'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 
'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 
'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty',
 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall',
 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/__init__.py',
 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/__init__.py', 
'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 
'use_averaged_model': False,
 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 
'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 
'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 
'decoding_method': 'fast_beam_search', 'beam_size': 4,
 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 
'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 
'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 
'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 
'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 
'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 
'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 
'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 
'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 
'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 
'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0,
 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True,
 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True,
 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 
'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 
'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048,
 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 
'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768,
 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8,
 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 
'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

csukuangfj commented 3 weeks ago

Also, would you mind sharing the command you are using? And could you tell us what steps you have done?

More details are always helpful.

Manjunath-mlp commented 3 weeks ago

Sorry ,i just clicked enter before pasting all ,here are the code blocks i am using

These are the args i used : {'best_train_loss': float("inf"), 'best_valid_loss': float("inf"), 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': False, 'k2-git-sha1': '5735fa707f6091856d13ccd230aced6e9e64f815', 'k2-git-date': 'Thu Jul 25 09:16:03 2024', 'lhotse-version': '1.28.0.dev+git.4ca97dc.clean', 'torch-version': '2.3.0', 'torch-cuda-available': False, 'torch-cuda-version': None, 'python-version': '3.10', 'icefall-git-branch': 'master', 'icefall-git-sha1': '59529722-dirty', 'icefall-git-date': 'Sat Aug 17 10:54:38 2024', 'icefall-path': '/Users/Manjunath/Downloads/sourcek2/icefall', 'k2-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/k2-1.24.4.dev20240823+cpu.torch2.3.0-py3.10-macosx-11.1-arm64.egg/k2/init.py', 'lhotse-path': '/Users/Manjunath/miniconda3/envs/k2source/lib/python3.10/site-packages/lhotse/init.py', 'hostname': '', 'IP address': ''}, 'epoch': 30, 'iter': 0, 'avg': 1, 'use_averaged_model': False, 'exp_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp', 'bpe_model': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/bpe.model', 'lang_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500', 'decoding_method': 'fast_beam_search', 'beam_size': 4, 'beam': 20.0, 'ngram_lm_scale': 0.01, 'max_contexts': 8, 'max_states': 64, 'context_size': 2, 'max_sym_per_frame': 1, 'num_paths': 200, 'nbest_scale': 0.5, 'use_shallow_fusion': False, 'lm_type': 'rnn', 'lm_scale': 0.3, 'tokens_ngram': 2, 'backoff_id': 500, 'num_encoder_layers': '2,4,3,2,4', 'feedforward_dims': '1024,1024,2048,2048,1024', 'nhead': '8,8,8,8,8', 'encoder_dims': '384,384,384,384,384', 'attention_dims': '192,192,192,192,192', 'encoder_unmasked_dims': '256,256,256,256,256', 'zipformer_downsampling_factors': '1,2,4,8,2', 'cnn_module_kernels': '31,31,31,31,31', 'decoder_dim': 512, 'joiner_dim': 512, 'short_chunk_size': 50, 'num_left_chunks': 4, 'decode_chunk_len': 32, 'full_libri': True, 'mini_libri': False, 'manifest_dir': '../data/fbank', 'max_duration': 600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'lm_vocab_size': 500, 'lm_epoch': 7, 'lm_avg': 1, 'lm_exp_dir': None, 'rnn_lm_embedding_dim': 2048, 'rnn_lm_hidden_dim': 2048, 'rnn_lm_num_layers': 3, 'rnn_lm_tie_weights': True, 'transformer_lm_exp_dir': None, 'transformer_lm_dim_feedforward': 2048, 'transformer_lm_encoder_dim': 768, 'transformer_lm_embedding_dim': 768, 'transformer_lm_nhead': 8, 'transformer_lm_num_layers': 16, 'transformer_lm_tie_weights': True, 'res_dir': '/Users/Manjunath/Downloads/sourcek2/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/fast_beam_search', 'suffix': 'epoch-30-avg-1-streaming-chunk-size-32-beam-20.0-max-contexts-8-max-states-64', 'blank_id': 0, 'unk_id': 2, 'vocab_size': 500}

initiated the model using above args

model = get_transducer_model(args)

and i used librispeech cuts dataset

args1=Namespace(epoch=30, avg=1, use_averaged_model=True, exp_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp/', lang_dir='../../../../../icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/data/lang_bpe_500/', decoding_method='fast_beam_search', iter=0, context_size=2, max_sym_per_frame=1, return_cuts=True, on_the_fly_feats=False, input_strategy='PrecomputedFeatures', max_duration=10, num_workers=2)

librispeech = LibriSpeechAsrDataModule(args1) test_clean_cuts = librispeech.test_clean_cuts_soft() test_other_cuts = librispeech.dev_other_cuts_soft()

test_clean_dl = librispeech.test_dataloaders(test_clean_cuts) test_other_dl = librispeech.test_dataloaders(test_other_cuts)

test_sets = ["test-clean", "test-other"] test_dl = [test_clean_dl, test_other_dl]

Used first input to decode

for i,j in enumerate(test_clean_dl): print(i,j) break

i,j are

0 {'inputs': tensor([[[-1.4938e+01, -1.3318e+01, -1.3666e+01, ..., -9.2335e+00, -9.9011e+00, -1.0107e+01], [-1.4294e+01, -1.2946e+01, -1.2869e+01, ..., -9.3337e+00, -1.0197e+01, -1.0312e+01], [-1.5064e+01, -1.5173e+01, -1.5958e+01, ..., -9.8856e+00, -9.9233e+00, -1.0065e+01], ..., [-1.2570e+01, -1.2061e+01, -1.3426e+01, ..., 5.1158e+37, 9.4185e+37, 1.7210e+38], [-1.4552e+01, -1.3632e+01, -1.3024e+01, ..., 8.7674e+37, 1.6231e+38, 2.9824e+38], [-1.5214e+01, -1.5527e+01, -1.3573e+01, ..., 1.4966e+38, 2.7860e+38, inf]]]), 'supervisions': {'text': ['BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED'], 'sequence_idx': tensor([0], dtype=torch.int32), 'start_frame': tensor([0], dtype=torch.int32), 'num_frames': tensor([1608], dtype=torch.int32), 'cut': [MonoCut(id='7127-75946-0028-495', start=0, duration=16.075, channel=0, supervisions=[SupervisionSegment(id='7127-75946-0028', recording_id='7127-75946-0028', start=0.0, duration=16.075, channel=0, text='BY DEGREES ALL HIS HAPPINESS ALL HIS BRILLIANCY SUBSIDED INTO REGRET AND UNEASINESS SO THAT HIS LIMBS LOST THEIR POWER HIS ARMS HUNG HEAVILY BY HIS SIDES AND HIS HEAD DROOPED AS THOUGH HE WAS STUPEFIED', language='English', speaker='7127', gender=None, custom=None, alignment=None)], features=Features(type='kaldi-fbank', num_frames=1608, num_features=80, frame_shift=0.01, sampling_rate=16000, start=0, duration=16.075, storage_type='lilcom_chunky', storage_path='../data/fbank/librispeech_feats_test-clean/feats-0.lca', storage_key='2337650,45819,45198,44901,10324', recording_id='None', channels=0), recording=Recording(id='7127-75946-0028', sources=[AudioSource(type='file', channels=[0], source='/grid/codes/icefall/egs/librispeech/ASR/download/LibriSpeech/test-clean/7127/75946/7127-75946-0028.flac')], sampling_rate=16000, num_samples=257200, duration=16.075, channel_ids=[0], transforms=None), custom={'dataloading_info': {'rank': 0, 'world_size': 1, 'worker_id': None}})]}}

feature=j["inputs"] supervisions = j["supervisions"] texts = j["supervisions"]["text"] feature_lens = supervisions["num_frames"] feature_lens += 30

import torch import math LOG_EPS = math.log(1e-10)

feature = torch.nn.functional.pad( feature, pad=(0, 0, 0, 30), value=LOG_EPS, ) encoder_out, encoder_out_lens = model.encoder(x=feature, x_lens=feature_lens)

Here for encoder_out i am getting nans

csukuangfj commented 3 weeks ago

Could you share the complete file?

You can upload your code file as an attachment in the comment.

Manjunath-mlp commented 3 weeks ago

will this works? fast_beam_search.txt

csukuangfj commented 3 weeks ago

Could you post a runnable PYTHON CODE FILE?

We need to know which script you are using.

csukuangfj commented 3 weeks ago

By the way, I suggest that you follow the doc https://k2-fsa.github.io/icefall/model-export/export-model-state-dict.html to learn how to use pre-trained models.

Manjunath-mlp commented 3 weeks ago

thats the ipynb file i am using to run ,i am unable to attach py or ipynb file.I am trying to implement this https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L444 for stateless7 streaming model ,I am trying to see the outputs at each timestep.

Manjunath-mlp commented 3 weeks ago

I think have loaded the model dict of pretrained model pretty much the same ,you guys have implemented.For model.decoder i am able to see the model is predicting numbers .I dont know why encoder is predicting nan

k2-fsa / icefall

Nan outputs from encoder #1731

initiated the model using above args

Used first input to decode

i,j are