k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
941 stars 299 forks source link

Invalid input size #320

Open AmirHussein96 opened 2 years ago

AmirHussein96 commented 2 years ago

I tried to create icefall recipe for MGB2-100h and I did not get any error during the preparation stage. However I got an error during the training after batch 600.

The error mentions that: shape '[-1, 584, 64]' is invalid for input of size 2801664 I attached the training log below.

For data preparation, I loaded the data from kaldi like folder and I made sure that I am using the provided segments by adding cuts.trim_to_supervisions() after cut_set.compute_and_store_features_batch() similar to https://github.com/wgb14/icefall/blob/b429efa661109de226718c44812324cd65125038/egs/gigaspeech/ASR/local/compute_fbank_gigaspeech_splits.py#L130-L133. I double checked the statistics of the resulted segments cut_set.describe() and the min s egment=0.1s and max segment = 30s.

Is it possible that there is a limit for how long the segments should be? and could the segments larger than 20s cause this issue?

errors-6955569.txt

pzelasko commented 2 years ago

I'd be more concerned that your CTC losses go to infinity after the first batch, I'd double check your data/transcripts/lexicon/etc..

csukuangfj commented 2 years ago

I double checked the statistics of the resulted segments cut_set.describe() and the min segment=0.1s and max segment = 30s.

The min segment is 0.1s, which is too short. That may explain why the loss goes to inf.

For CTC training, you have to ensure that number of input feature frames must not be shorter than the corresponding input token length.

You can follow https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/pruned_transducer_stateless2/train.py#L849 to filter short and long utterances from your data.

Also, you can use the approach proposed in https://github.com/k2-fsa/icefall/issues/135#issuecomment-1101906199 to filter utterances with inf losses.

AmirHussein96 commented 2 years ago

Thank you, I resolved the inf issue. But what about the invalid input size, what could cause that?

csukuangfj commented 2 years ago
Traceback (most recent call last):
  File "conformer_ctc/train.py", line 735, in <module>
    main()
  File "conformer_ctc/train.py", line 728, in main
    run(rank=0, world_size=1, args=args)
  File "conformer_ctc/train.py", line 652, in run
    train_one_epoch(
  File "conformer_ctc/train.py", line 491, in train_one_epoch
    loss, loss_info = compute_loss(
  File "conformer_ctc/train.py", line 390, in compute_loss
    att_loss = mmodel.decoder_forward(
  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py", line 303, in decoder_forward
    pred_pad = self.decoder(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 231, in forward
    output = mod(output, memory, tgt_mask=tgt_mask,
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py", line 615, in forward
    tgt2 = self.src_attn(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 980, in forward
    return F.multi_head_attention_forward(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/functional.py", line 4761, in multi_head_attention_forward
    k = k.contiguous().view(-1, bsz * num_heads, head_dim).transpose(0, 1)
RuntimeError: shape '[-1, 584, 64]' is invalid for input of size 2801664

Could you try to find the shape of k before running view? You can use pdb to debug the code.

AmirHussein96 commented 2 years ago

Thank you @csukuangfj, just a quick update on the issue:

2022-04-21 05:52:51,628 INFO [train.py:577] Training started
2022-04-21 05:52:51,628 INFO [train.py:578] {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'use_feat_batchnorm': True, 'attention_dim': 512, 'nhead': 8, 'num_decoder_layers': 4, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'weight_decay': 1e-06, 'warm_step': 15000, 'env_info': {'k2-version': '1.11', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '', 'k2-git-date': '', 'lhotse-version': '1.0.0.dev+git.6a3192a.clean', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'test', 'icefall-git-sha1': '00ffcc8-dirty', 'icefall-git-date': 'Mon Apr 18 16:43:26 2022', 'icefall-path': '/alt-arabic/speech/amir/k2/tmp/icefall', 'k2-path': '/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/k2/__init__.py', 'lhotse-path': '/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/lhotse/__init__.py', 'hostname': 'crimv3mgpu026', 'IP address': '10.141.0.31'}, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 0, 'exp_dir': PosixPath('conformer_ctc/exp'), 'lang_dir': PosixPath('data/lang_bpe_500'), 'att_rate': 0.8, 'lr_factor': 10.0, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 100, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True}
2022-04-21 05:52:51,753 INFO [lexicon.py:176] Loading pre-compiled data/lang_bpe_500/Linv.pt
2022-04-21 05:52:51,860 INFO [train.py:600] About to create model
2022-04-21 05:52:55,796 INFO [asr_datamodule.py:341] About to get train cuts
2022-04-21 05:53:07,704 INFO [asr_datamodule.py:175] About to get Musan cuts
2022-04-21 05:53:10,014 INFO [asr_datamodule.py:182] Enable MUSAN
2022-04-21 05:53:10,015 INFO [asr_datamodule.py:207] Enable SpecAugment
2022-04-21 05:53:10,015 INFO [asr_datamodule.py:208] Time warp factor: 80
2022-04-21 05:53:10,015 INFO [asr_datamodule.py:223] About to create train dataset
2022-04-21 05:53:10,015 INFO [asr_datamodule.py:251] Using BucketingSampler.
2022-04-21 05:53:10,587 INFO [asr_datamodule.py:268] About to create train dataloader
2022-04-21 05:53:10,587 INFO [asr_datamodule.py:348] About to get dev cuts
2022-04-21 05:53:10,839 INFO [train.py:706] Sanity check -- see if any of the batches in epoch 0 would cause OOM.
2022-04-21 05:54:24,427 INFO [train.py:668] epoch 0, learning rate 1.4433756729740647e-06
2022-04-21 05:54:31,292 INFO [train.py:518] Epoch 0, batch 0, loss[ctc_loss=5.061, att_loss=1.191, loss=1.965, over 2320 frames.], tot_loss[ctc_loss=5.061, att_loss=1.191, loss=1.965, over 2320 frames.], batch size: 13
2022-04-21 05:57:36,969 INFO [train.py:518] Epoch 0, batch 50, loss[ctc_loss=1.191, att_loss=1.301, loss=1.279, over 2435 frames.], tot_loss[ctc_loss=2.077, att_loss=1.27, loss=1.431, over 107509.4296691422 frames.], batch size: 20
2022-04-21 06:00:34,122 INFO [train.py:518] Epoch 0, batch 100, loss[ctc_loss=1.034, att_loss=1.156, loss=1.132, over 2299 frames.], tot_loss[ctc_loss=1.568, att_loss=1.278, loss=1.336, over 189010.91336205578 frames.], batch size: 11
2022-04-21 06:03:04,013 INFO [train.py:518] Epoch 0, batch 150, loss[ctc_loss=1.132, att_loss=1.205, loss=1.19, over 2465 frames.], tot_loss[ctc_loss=1.381, att_loss=1.257, loss=1.281, over 253403.388687638 frames.], batch size: 19
2022-04-21 06:05:51,104 INFO [train.py:518] Epoch 0, batch 200, loss[ctc_loss=1.061, att_loss=1.092, loss=1.086, over 2374 frames.], tot_loss[ctc_loss=1.284, att_loss=1.22, loss=1.233, over 303070.1197025786 frames.], batch size: 9
2022-04-21 06:08:56,921 INFO [train.py:518] Epoch 0, batch 250, loss[ctc_loss=0.9316, att_loss=0.9555, loss=0.9507, over 2424 frames.], tot_loss[ctc_loss=1.225, att_loss=1.192, loss=1.198, over 341160.6834763047 frames.], batch size: 6
2022-04-21 06:11:28,601 INFO [train.py:518] Epoch 0, batch 300, loss[ctc_loss=1.146, att_loss=1.175, loss=1.169, over 2484 frames.], tot_loss[ctc_loss=1.179, att_loss=1.164, loss=1.167, over 370860.92252996884 frames.], batch size: 22
2022-04-21 06:13:53,411 INFO [train.py:518] Epoch 0, batch 350, loss[ctc_loss=1.2, att_loss=1.233, loss=1.227, over 2464 frames.], tot_loss[ctc_loss=1.148, att_loss=1.141, loss=1.143, over 395072.958579734 frames.], batch size: 36
2022-04-21 06:16:41,771 INFO [train.py:518] Epoch 0, batch 400, loss[ctc_loss=1.444, att_loss=1.458, loss=1.455, over 2433 frames.], tot_loss[ctc_loss=1.133, att_loss=1.129, loss=1.13, over 412903.9178338298 frames.], batch size: 55
2022-04-21 06:18:40,932 INFO [train.py:518] Epoch 0, batch 450, loss[ctc_loss=1.103, att_loss=1.068, loss=1.075, over 2256 frames.], tot_loss[ctc_loss=1.114, att_loss=1.107, loss=1.108, over 426915.5850166171 frames.], batch size: 9
2022-04-21 06:21:16,518 INFO [train.py:518] Epoch 0, batch 500, loss[ctc_loss=0.7647, att_loss=0.7367, loss=0.7423, over 2223 frames.], tot_loss[ctc_loss=1.098, att_loss=1.086, loss=1.088, over 438207.72270125535 frames.], batch size: 8
2022-04-21 06:24:05,162 INFO [train.py:518] Epoch 0, batch 550, loss[ctc_loss=1.043, att_loss=0.9977, loss=1.007, over 2480 frames.], tot_loss[ctc_loss=1.099, att_loss=1.079, loss=1.083, over 447367.3462596891 frames.], batch size: 24
2022-04-21 06:26:38,987 INFO [train.py:518] Epoch 0, batch 600, loss[ctc_loss=1.092, att_loss=1.031, loss=1.043, over 2391 frames.], tot_loss[ctc_loss=1.093, att_loss=1.064, loss=1.069, over 453151.52900862123 frames.], batch size: 9
2022-04-21 06:30:08,474 INFO [train.py:518] Epoch 0, batch 650, loss[ctc_loss=1.105, att_loss=1.008, loss=1.027, over 2377 frames.], tot_loss[ctc_loss=1.093, att_loss=1.056, loss=1.063, over 459918.71265226486 frames.], batch size: 10
2022-04-21 06:32:18,331 INFO [train.py:518] Epoch 0, batch 700, loss[ctc_loss=0.9434, att_loss=0.8639, loss=0.8798, over 2243 frames.], tot_loss[ctc_loss=1.083, att_loss=1.037, loss=1.046, over 462414.72576395085 frames.], batch size: 6
2022-04-21 06:34:29,645 INFO [train.py:518] Epoch 0, batch 750, loss[ctc_loss=0.8518, att_loss=0.7725, loss=0.7884, over 2388 frames.], tot_loss[ctc_loss=1.074, att_loss=1.021, loss=1.032, over 465885.82317385863 frames.], batch size: 5
2022-04-21 06:37:22,898 INFO [train.py:518] Epoch 0, batch 800, loss[ctc_loss=1.035, att_loss=0.9442, loss=0.9623, over 2434 frames.], tot_loss[ctc_loss=1.078, att_loss=1.017, loss=1.029, over 468771.35407427634 frames.], batch size: 20
2022-04-21 06:41:21,939 INFO [train.py:518] Epoch 0, batch 850, loss[ctc_loss=1.006, att_loss=0.9148, loss=0.933, over 2271 frames.], tot_loss[ctc_loss=1.084, att_loss=1.015, loss=1.029, over 470680.22124100936 frames.], batch size: 10
2022-04-21 06:44:14,367 INFO [train.py:518] Epoch 0, batch 900, loss[ctc_loss=1.13, att_loss=1.03, loss=1.05, over 2365 frames.], tot_loss[ctc_loss=1.08, att_loss=1.005, loss=1.02, over 472108.779841794 frames.], batch size: 18
2022-04-21 06:47:05,414 INFO [train.py:518] Epoch 0, batch 950, loss[ctc_loss=1.034, att_loss=0.9327, loss=0.953, over 2291 frames.], tot_loss[ctc_loss=1.082, att_loss=1.002, loss=1.018, over 472844.6079304429 frames.], batch size: 7
2022-04-21 06:49:48,035 INFO [train.py:518] Epoch 0, batch 1000, loss[ctc_loss=1.055, att_loss=0.9323, loss=0.9569, over 2286 frames.], tot_loss[ctc_loss=1.075, att_loss=0.9883, loss=1.006, over 473197.85692998784 frames.], batch size: 10
2022-04-21 06:53:00,860 INFO [train.py:518] Epoch 0, batch 1050, loss[ctc_loss=1.245, att_loss=1.128, loss=1.151, over 2450 frames.], tot_loss[ctc_loss=1.074, att_loss=0.9834, loss=1.002, over 474634.7888984948 frames.], batch size: 36
2022-04-21 06:55:32,768 INFO [train.py:518] Epoch 0, batch 1100, loss[ctc_loss=1.111, att_loss=1.011, loss=1.031, over 2448 frames.], tot_loss[ctc_loss=1.069, att_loss=0.9746, loss=0.9934, over 474184.0629728816 frames.], batch size: 7
2022-04-21 06:58:20,957 INFO [train.py:518] Epoch 0, batch 1150, loss[ctc_loss=0.9636, att_loss=0.8746, loss=0.8924, over 2388 frames.], tot_loss[ctc_loss=1.063, att_loss=0.9677, loss=0.9868, over 474719.40023238154 frames.], batch size: 11
2022-04-21 07:01:24,279 INFO [train.py:518] Epoch 0, batch 1200, loss[ctc_loss=1.164, att_loss=1.049, loss=1.072, over 2399 frames.], tot_loss[ctc_loss=1.06, att_loss=0.9637, loss=0.9829, over 475875.42357611464 frames.], batch size: 26
2022-04-21 07:04:55,147 INFO [train.py:518] Epoch 0, batch 1250, loss[ctc_loss=1.058, att_loss=0.9682, loss=0.9862, over 2359 frames.], tot_loss[ctc_loss=1.06, att_loss=0.9639, loss=0.9831, over 476769.094751134 frames.], batch size: 5
2022-04-21 07:08:25,470 INFO [train.py:518] Epoch 0, batch 1300, loss[ctc_loss=0.8492, att_loss=0.781, loss=0.7947, over 2154 frames.], tot_loss[ctc_loss=1.06, att_loss=0.9634, loss=0.9827, over 476576.0384726524 frames.], batch size: 5
2022-04-21 07:10:41,656 INFO [train.py:518] Epoch 0, batch 1350, loss[ctc_loss=0.8718, att_loss=0.8021, loss=0.8161, over 2365 frames.], tot_loss[ctc_loss=1.052, att_loss=0.9574, loss=0.9763, over 477078.1436026277 frames.], batch size: 12
2022-04-21 07:13:33,708 INFO [train.py:518] Epoch 0, batch 1400, loss[ctc_loss=0.9169, att_loss=0.8162, loss=0.8363, over 2376 frames.], tot_loss[ctc_loss=1.051, att_loss=0.9548, loss=0.974, over 477269.5812537873 frames.], batch size: 15
2022-04-21 07:16:04,598 INFO [train.py:518] Epoch 0, batch 1450, loss[ctc_loss=1.045, att_loss=0.9591, loss=0.9763, over 2355 frames.], tot_loss[ctc_loss=1.048, att_loss=0.9522, loss=0.9713, over 476443.49005118554 frames.], batch size: 12
2022-04-21 07:18:41,119 INFO [train.py:518] Epoch 0, batch 1500, loss[ctc_loss=0.9176, att_loss=0.8385, loss=0.8543, over 2494 frames.], tot_loss[ctc_loss=1.043, att_loss=0.9461, loss=0.9654, over 477369.8605307082 frames.], batch size: 11
2022-04-21 07:21:06,682 INFO [train.py:518] Epoch 0, batch 1550, loss[ctc_loss=1.109, att_loss=1.023, loss=1.04, over 2434 frames.], tot_loss[ctc_loss=1.041, att_loss=0.9437, loss=0.9631, over 477223.80035773537 frames.], batch size: 30
2022-04-21 07:23:36,228 INFO [train.py:518] Epoch 0, batch 1600, loss[ctc_loss=1.221, att_loss=1.099, loss=1.123, over 2440 frames.], tot_loss[ctc_loss=1.036, att_loss=0.9385, loss=0.958, over 476870.35551496665 frames.], batch size: 36
2022-04-21 07:25:48,504 INFO [train.py:518] Epoch 0, batch 1650, loss[ctc_loss=0.9172, att_loss=0.8393, loss=0.8549, over 2442 frames.], tot_loss[ctc_loss=1.049, att_loss=0.9365, loss=0.9591, over 478060.1022561962 frames.], batch size: 6
2022-04-21 07:29:02,722 INFO [train.py:518] Epoch 0, batch 1700, loss[ctc_loss=0.8854, att_loss=0.7908, loss=0.8097, over 2430 frames.], tot_loss[ctc_loss=1.054, att_loss=0.9425, loss=0.9648, over 478327.3292176934 frames.], batch size: 7
2022-04-21 07:31:06,144 INFO [train.py:518] Epoch 0, batch 1750, loss[ctc_loss=0.9144, att_loss=0.8418, loss=0.8563, over 2250 frames.], tot_loss[ctc_loss=1.047, att_loss=0.9378, loss=0.9597, over 478173.3111760538 frames.], batch size: 6
2022-04-21 07:33:17,875 INFO [train.py:518] Epoch 0, batch 1800, loss[ctc_loss=1.029, att_loss=0.937, loss=0.9554, over 2380 frames.], tot_loss[ctc_loss=1.04, att_loss=0.9315, loss=0.9532, over 478104.1236229297 frames.], batch size: 9
2022-04-21 07:35:20,473 INFO [train.py:518] Epoch 0, batch 1850, loss[ctc_loss=0.9364, att_loss=0.7757, loss=0.8079, over 2260 frames.], tot_loss[ctc_loss=1.03, att_loss=0.9226, loss=0.9442, over 477063.1158242521 frames.], batch size: 9
2022-04-21 07:38:27,857 INFO [train.py:518] Epoch 0, batch 1900, loss[ctc_loss=1.006, att_loss=0.8771, loss=0.9029, over 2410 frames.], tot_loss[ctc_loss=1.036, att_loss=0.9275, loss=0.9491, over 477343.43987470586 frames.], batch size: 10
2022-04-21 07:41:11,969 INFO [train.py:518] Epoch 0, batch 1950, loss[ctc_loss=1.04, att_loss=0.9406, loss=0.9604, over 2471 frames.], tot_loss[ctc_loss=1.037, att_loss=0.928, loss=0.9497, over 477697.66897319723 frames.], batch size: 24
2022-04-21 07:43:55,214 INFO [train.py:518] Epoch 0, batch 2000, loss[ctc_loss=1.054, att_loss=0.9352, loss=0.959, over 2425 frames.], tot_loss[ctc_loss=1.038, att_loss=0.9284, loss=0.9503, over 477990.5629966493 frames.], batch size: 13
2022-04-21 07:46:45,562 INFO [train.py:518] Epoch 0, batch 2050, loss[ctc_loss=1.067, att_loss=0.927, loss=0.9551, over 2217 frames.], tot_loss[ctc_loss=1.041, att_loss=0.9303, loss=0.9524, over 477820.18066301086 frames.], batch size: 8
2022-04-21 07:49:03,758 INFO [train.py:518] Epoch 0, batch 2100, loss[ctc_loss=0.8473, att_loss=0.7685, loss=0.7843, over 2497 frames.], tot_loss[ctc_loss=1.035, att_loss=0.9248, loss=0.9469, over 477825.9731916422 frames.], batch size: 10
2022-04-21 07:51:09,113 INFO [train.py:518] Epoch 0, batch 2150, loss[ctc_loss=1.065, att_loss=0.9526, loss=0.9751, over 2415 frames.], tot_loss[ctc_loss=1.033, att_loss=0.9222, loss=0.9442, over 477425.6284120875 frames.], batch size: 10
2022-04-21 07:54:06,755 INFO [train.py:518] Epoch 0, batch 2200, loss[ctc_loss=0.849, att_loss=0.738, loss=0.7602, over 2234 frames.], tot_loss[ctc_loss=1.033, att_loss=0.9209, loss=0.9433, over 477411.84584470035 frames.], batch size: 8
2022-04-21 07:56:40,127 INFO [train.py:518] Epoch 0, batch 2250, loss[ctc_loss=0.8703, att_loss=0.7777, loss=0.7962, over 2276 frames.], tot_loss[ctc_loss=1.032, att_loss=0.9195, loss=0.942, over 476505.6217080375 frames.], batch size: 10
2022-04-21 07:59:14,895 INFO [train.py:518] Epoch 0, batch 2300, loss[ctc_loss=1.077, att_loss=0.9464, loss=0.9725, over 2435 frames.], tot_loss[ctc_loss=1.033, att_loss=0.9192, loss=0.9418, over 476429.0595070139 frames.], batch size: 20
2022-04-21 08:01:43,781 INFO [train.py:518] Epoch 0, batch 2350, loss[ctc_loss=0.9691, att_loss=0.8787, loss=0.8968, over 2176 frames.], tot_loss[ctc_loss=1.03, att_loss=0.9158, loss=0.9387, over 476732.4427174845 frames.], batch size: 5
2022-04-21 08:04:05,335 INFO [train.py:518] Epoch 0, batch 2400, loss[ctc_loss=0.8843, att_loss=0.7649, loss=0.7888, over 2342 frames.], tot_loss[ctc_loss=1.024, att_loss=0.9108, loss=0.9335, over 476541.5631116045 frames.], batch size: 14
2022-04-21 08:06:29,948 INFO [train.py:518] Epoch 0, batch 2450, loss[ctc_loss=1.028, att_loss=0.9219, loss=0.9431, over 2369 frames.], tot_loss[ctc_loss=1.022, att_loss=0.9081, loss=0.9309, over 476097.7498909367 frames.], batch size: 14
2022-04-21 08:08:46,121 INFO [train.py:518] Epoch 0, batch 2500, loss[ctc_loss=0.9684, att_loss=0.8695, loss=0.8893, over 2225 frames.], tot_loss[ctc_loss=1.023, att_loss=0.9089, loss=0.9317, over 476485.3965873593 frames.], batch size: 8
2022-04-21 08:11:37,803 INFO [train.py:518] Epoch 0, batch 2550, loss[ctc_loss=0.9374, att_loss=0.8195, loss=0.8431, over 2409 frames.], tot_loss[ctc_loss=1.033, att_loss=0.9156, loss=0.939, over 476853.5506311325 frames.], batch size: 13
2022-04-21 08:14:22,682 INFO [train.py:518] Epoch 0, batch 2600, loss[ctc_loss=1.01, att_loss=0.8976, loss=0.9201, over 2433 frames.], tot_loss[ctc_loss=1.033, att_loss=0.9159, loss=0.9394, over 476125.2201404255 frames.], batch size: 13
2022-04-21 08:17:02,631 INFO [train.py:518] Epoch 0, batch 2650, loss[ctc_loss=1.031, att_loss=0.9207, loss=0.9427, over 2366 frames.], tot_loss[ctc_loss=1.028, att_loss=0.9103, loss=0.9339, over 475966.3960131043 frames.], batch size: 18
2022-04-21 08:19:40,261 INFO [train.py:518] Epoch 0, batch 2700, loss[ctc_loss=1.049, att_loss=0.938, loss=0.9602, over 2471 frames.], tot_loss[ctc_loss=1.038, att_loss=0.9168, loss=0.9411, over 476245.52871668944 frames.], batch size: 19
2022-04-21 08:22:02,025 INFO [train.py:518] Epoch 0, batch 2750, loss[ctc_loss=1.058, att_loss=0.896, loss=0.9284, over 2376 frames.], tot_loss[ctc_loss=1.037, att_loss=0.9153, loss=0.9396, over 476722.3453734679 frames.], batch size: 17
2022-04-21 08:24:29,405 INFO [train.py:518] Epoch 0, batch 2800, loss[ctc_loss=0.9707, att_loss=0.8566, loss=0.8795, over 2408 frames.], tot_loss[ctc_loss=1.031, att_loss=0.9091, loss=0.9335, over 476983.38517082867 frames.], batch size: 5
2022-04-21 08:27:27,808 INFO [train.py:518] Epoch 0, batch 2850, loss[ctc_loss=0.9507, att_loss=0.8314, loss=0.8553, over 2317 frames.], tot_loss[ctc_loss=1.038, att_loss=0.9111, loss=0.9365, over 477327.5832030542 frames.], batch size: 13
2022-04-21 08:29:40,058 INFO [train.py:518] Epoch 0, batch 2900, loss[ctc_loss=1.005, att_loss=0.8781, loss=0.9034, over 2403 frames.], tot_loss[ctc_loss=1.037, att_loss=0.9067, loss=0.9327, over 477013.1835074181 frames.], batch size: 11
2022-04-21 08:32:02,863 INFO [train.py:518] Epoch 0, batch 2950, loss[ctc_loss=0.83, att_loss=0.7117, loss=0.7353, over 2322 frames.], tot_loss[ctc_loss=1.035, att_loss=0.9043, loss=0.9305, over 477016.8214064601 frames.], batch size: 8
2022-04-21 08:34:16,314 INFO [train.py:518] Epoch 0, batch 3000, loss[ctc_loss=0.9819, att_loss=0.8759, loss=0.8971, over 2288 frames.], tot_loss[ctc_loss=1.031, att_loss=0.9002, loss=0.9263, over 475826.8325706396 frames.], batch size: 7
2022-04-21 08:34:16,315 INFO [train.py:535] Computing validation loss
2022-04-21 08:48:49,529 INFO [train.py:544] Epoch 0, validation: ctc_loss=1.524, att_loss=0.946, loss=1.062, over 760000 frames.
2022-04-21 08:51:42,209 INFO [train.py:518] Epoch 0, batch 3050, loss[ctc_loss=1.202, att_loss=1.054, loss=1.084, over 2444 frames.], tot_loss[ctc_loss=1.032, att_loss=0.8992, loss=0.9258, over 476341.6594394003 frames.], batch size: 24
2022-04-21 08:54:32,939 INFO [train.py:518] Epoch 0, batch 3100, loss[ctc_loss=1.092, att_loss=0.9373, loss=0.9681, over 2370 frames.], tot_loss[ctc_loss=1.038, att_loss=0.9031, loss=0.93, over 475575.52939944016 frames.], batch size: 9
2022-04-21 08:56:53,883 INFO [train.py:518] Epoch 0, batch 3150, loss[ctc_loss=0.8172, att_loss=0.7108, loss=0.7321, over 2344 frames.], tot_loss[ctc_loss=1.035, att_loss=0.9, loss=0.927, over 474746.2084058315 frames.], batch size: 8
2022-04-21 08:59:18,278 INFO [train.py:518] Epoch 0, batch 3200, loss[ctc_loss=0.8851, att_loss=0.7812, loss=0.802, over 2495 frames.], tot_loss[ctc_loss=1.032, att_loss=0.8975, loss=0.9243, over 476234.11710422236 frames.], batch size: 9
2022-04-21 09:01:52,184 INFO [train.py:518] Epoch 0, batch 3250, loss[ctc_loss=0.99, att_loss=0.8377, loss=0.8681, over 2399 frames.], tot_loss[ctc_loss=1.031, att_loss=0.8955, loss=0.9225, over 475483.97347389394 frames.], batch size: 6
2022-04-21 09:04:03,478 INFO [train.py:518] Epoch 0, batch 3300, loss[ctc_loss=0.9222, att_loss=0.7902, loss=0.8166, over 2387 frames.], tot_loss[ctc_loss=1.023, att_loss=0.889, loss=0.9158, over 475512.2607406601 frames.], batch size: 9
2022-04-21 09:06:27,782 INFO [train.py:518] Epoch 0, batch 3350, loss[ctc_loss=1.091, att_loss=0.9247, loss=0.9579, over 2428 frames.], tot_loss[ctc_loss=1.024, att_loss=0.8881, loss=0.9153, over 476473.2659861878 frames.], batch size: 13
2022-04-21 09:09:27,444 INFO [train.py:518] Epoch 0, batch 3400, loss[ctc_loss=0.951, att_loss=0.8539, loss=0.8734, over 2223 frames.], tot_loss[ctc_loss=1.031, att_loss=0.894, loss=0.9215, over 477057.2100871301 frames.], batch size: 8
2022-04-21 09:12:07,586 INFO [train.py:518] Epoch 0, batch 3450, loss[ctc_loss=1, att_loss=0.8742, loss=0.8995, over 2486 frames.], tot_loss[ctc_loss=1.031, att_loss=0.8936, loss=0.9211, over 476785.33322705346 frames.], batch size: 14
2022-04-21 09:15:12,785 INFO [train.py:518] Epoch 0, batch 3500, loss[ctc_loss=1.002, att_loss=0.8932, loss=0.915, over 2376 frames.], tot_loss[ctc_loss=1.04, att_loss=0.9001, loss=0.928, over 477330.8683795313 frames.], batch size: 12
2022-04-21 09:17:42,892 INFO [train.py:518] Epoch 0, batch 3550, loss[ctc_loss=0.8621, att_loss=0.7151, loss=0.7445, over 2341 frames.], tot_loss[ctc_loss=1.037, att_loss=0.8971, loss=0.9251, over 476837.23755339306 frames.], batch size: 14
2022-04-21 09:20:46,646 INFO [train.py:518] Epoch 0, batch 3600, loss[ctc_loss=1.094, att_loss=0.9693, loss=0.9942, over 2466 frames.], tot_loss[ctc_loss=1.039, att_loss=0.8983, loss=0.9265, over 477316.55265394883 frames.], batch size: 8
2022-04-21 09:23:15,838 INFO [train.py:518] Epoch 0, batch 3650, loss[ctc_loss=1.04, att_loss=0.867, loss=0.9016, over 2433 frames.], tot_loss[ctc_loss=1.033, att_loss=0.8941, loss=0.9219, over 477573.05299235234 frames.], batch size: 13
2022-04-21 09:25:38,437 INFO [train.py:518] Epoch 0, batch 3700, loss[ctc_loss=0.957, att_loss=0.812, loss=0.841, over 2476 frames.], tot_loss[ctc_loss=1.035, att_loss=0.8939, loss=0.9221, over 477746.542956095 frames.], batch size: 22
2022-04-21 09:28:04,556 INFO [train.py:518] Epoch 0, batch 3750, loss[ctc_loss=0.9738, att_loss=0.8396, loss=0.8664, over 2155 frames.], tot_loss[ctc_loss=1.032, att_loss=0.8916, loss=0.9197, over 477628.9627351081 frames.], batch size: 5
2022-04-21 09:31:13,758 INFO [train.py:518] Epoch 0, batch 3800, loss[ctc_loss=0.9774, att_loss=0.8016, loss=0.8368, over 2289 frames.], tot_loss[ctc_loss=1.038, att_loss=0.8948, loss=0.9234, over 476600.9141201745 frames.], batch size: 10
2022-04-21 09:33:53,852 INFO [train.py:518] Epoch 0, batch 3850, loss[ctc_loss=0.9243, att_loss=0.8158, loss=0.8375, over 2257 frames.], tot_loss[ctc_loss=1.037, att_loss=0.8943, loss=0.9229, over 476381.438243684 frames.], batch size: 9
2022-04-21 09:36:23,798 INFO [train.py:518] Epoch 0, batch 3900, loss[ctc_loss=1.166, att_loss=0.9792, loss=1.017, over 2481 frames.], tot_loss[ctc_loss=1.039, att_loss=0.894, loss=0.9229, over 475737.76793054014 frames.], batch size: 31
2022-04-21 09:38:41,072 INFO [train.py:518] Epoch 0, batch 3950, loss[ctc_loss=1.029, att_loss=0.8477, loss=0.8839, over 2462 frames.], tot_loss[ctc_loss=1.037, att_loss=0.892, loss=0.9209, over 475660.8959312289 frames.], batch size: 22
2022-04-21 09:41:28,687 INFO [train.py:518] Epoch 0, batch 4000, loss[ctc_loss=0.938, att_loss=0.8137, loss=0.8386, over 2399 frames.], tot_loss[ctc_loss=1.039, att_loss=0.8926, loss=0.9219, over 476235.8065190327 frames.], batch size: 9
2022-04-21 09:43:43,456 INFO [train.py:518] Epoch 0, batch 4050, loss[ctc_loss=1.122, att_loss=0.9509, loss=0.9851, over 2476 frames.], tot_loss[ctc_loss=1.038, att_loss=0.8918, loss=0.9209, over 475883.1504439389 frames.], batch size: 14
2022-04-21 09:46:07,437 INFO [train.py:518] Epoch 0, batch 4100, loss[ctc_loss=1.019, att_loss=0.895, loss=0.9198, over 2373 frames.], tot_loss[ctc_loss=1.037, att_loss=0.8909, loss=0.9201, over 475680.007975693 frames.], batch size: 9
2022-04-21 09:48:40,160 INFO [train.py:518] Epoch 0, batch 4150, loss[ctc_loss=1.302, att_loss=1.122, loss=1.158, over 2439 frames.], tot_loss[ctc_loss=1.035, att_loss=0.8892, loss=0.9184, over 476304.0273349623 frames.], batch size: 36
2022-04-21 09:50:57,912 INFO [train.py:518] Epoch 0, batch 4200, loss[ctc_loss=1.154, att_loss=1.023, loss=1.049, over 2431 frames.], tot_loss[ctc_loss=1.029, att_loss=0.884, loss=0.9129, over 476077.56749056117 frames.], batch size: 20
2022-04-21 09:53:35,156 INFO [train.py:518] Epoch 0, batch 4250, loss[ctc_loss=1.03, att_loss=0.8879, loss=0.9163, over 2493 frames.], tot_loss[ctc_loss=1.028, att_loss=0.8836, loss=0.9125, over 476509.83035038674 frames.], batch size: 8
2022-04-21 09:56:39,743 INFO [train.py:518] Epoch 0, batch 4300, loss[ctc_loss=0.9645, att_loss=0.8414, loss=0.866, over 2339 frames.], tot_loss[ctc_loss=1.032, att_loss=0.8864, loss=0.9156, over 476428.2856434755 frames.], batch size: 14
2022-04-21 09:59:32,099 INFO [train.py:518] Epoch 0, batch 4350, loss[ctc_loss=1.141, att_loss=0.9653, loss=1, over 2472 frames.], tot_loss[ctc_loss=1.042, att_loss=0.8943, loss=0.9238, over 476770.6612385662 frames.], batch size: 22
2022-04-21 10:01:52,760 INFO [train.py:518] Epoch 0, batch 4400, loss[ctc_loss=1.129, att_loss=0.9336, loss=0.9726, over 2452 frames.], tot_loss[ctc_loss=1.04, att_loss=0.8923, loss=0.9219, over 476799.94577099493 frames.], batch size: 30
2022-04-21 10:04:12,790 INFO [train.py:518] Epoch 0, batch 4450, loss[ctc_loss=0.9425, att_loss=0.849, loss=0.8677, over 2420 frames.], tot_loss[ctc_loss=1.036, att_loss=0.8878, loss=0.9174, over 477692.97930579586 frames.], batch size: 6
2022-04-21 10:06:37,743 INFO [train.py:518] Epoch 0, batch 4500, loss[ctc_loss=0.9869, att_loss=0.838, loss=0.8678, over 2480 frames.], tot_loss[ctc_loss=1.03, att_loss=0.8836, loss=0.913, over 477518.3131163074 frames.], batch size: 22
2022-04-21 10:09:21,416 INFO [train.py:518] Epoch 0, batch 4550, loss[ctc_loss=0.9902, att_loss=0.8839, loss=0.9052, over 2176 frames.], tot_loss[ctc_loss=1.031, att_loss=0.8844, loss=0.9138, over 477669.0642658331 frames.], batch size: 5
2022-04-21 10:12:43,687 INFO [train.py:518] Epoch 0, batch 4600, loss[ctc_loss=1.017, att_loss=0.9104, loss=0.9317, over 2449 frames.], tot_loss[ctc_loss=1.042, att_loss=0.8924, loss=0.9224, over 477544.6538788588 frames.], batch size: 13
Traceback (most recent call last):
  File "conformer_ctc/train.py", line 755, in <module>
    main()
  File "conformer_ctc/train.py", line 748, in main
    run(rank=0, world_size=1, args=args)
  File "conformer_ctc/train.py", line 672, in run
    train_one_epoch(
  File "conformer_ctc/train.py", line 499, in train_one_epoch
    loss, loss_info = compute_loss(
  File "conformer_ctc/train.py", line 397, in compute_loss
    att_loss = mmodel.decoder_forward(
  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py", line 303, in decoder_forward
    pred_pad = self.decoder(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 231, in forward
    output = mod(output, memory, tgt_mask=tgt_mask,
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py", line 615, in forward
    tgt2 = self.src_attn(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/activation.py", line 980, in forward
    return F.multi_head_attention_forward(
  File "/home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/functional.py", line 4781, in multi_head_attention_forward
    assert key_padding_mask.size(0) == bsz
AssertionError
danpovey commented 2 years ago

I think you should try to debug that error in pdb:

pdb -m python3 <cmdline>....
(pdb) r
.. wait till crash
(pdb) c
.... it should go to the source line, then you can do
(pdb) print key_padding_mask
(pdb) print bsz

...you can use "up" to go to other stack frames and print variables there too.. try to discover what's happening, maybe a weird-shaped batch.

AmirHussein96 commented 2 years ago

Actually I was printing the bsz and the k shape and just before the error I saw the following:

k shape:  torch.Size([35, 22, 512])
bsz:  22
k shape:  torch.Size([116, 22, 512])
bsz:  22
k shape:  torch.Size([35, 22, 512])
bsz:  22
k shape:  torch.Size([116, 22, 512])
bsz:  37
k shape:  torch.Size([32, 37, 512])
bsz:  37
k shape:  torch.Size([74, 36, 512]

For some reason the last middle shape 36 does not match the bsz 37, I am still thinking what could have caused this issue

AmirHussein96 commented 2 years ago

) print bsz

I think you should try to debug that error in pdb:

pdb -m python3 <cmdline>....
(pdb) r
.. wait till crash
(pdb) c
.... it should go to the source line, then you can do
(pdb) print key_padding_mask
(pdb) print bsz

...you can use "up" to go to other stack frames and print variables there too.. try to discover what's happening, maybe a weird-shaped batch.

Thank you I will do that and let you know

AmirHussein96 commented 2 years ago

I tracked the issue, it seems as you expected @danpovey the batch is badly formatted. However the funny thing is that when you check the len(supervisions['cut']) = batch size also len(supervisions['num_frames'])= batch size. So there should be number of features = batch size, no? I think there might be an issue with the dataloader.

print(key_padding_mask.size()) torch.Size([36, 74])

print(bsz) 37

(Pdb) u
    /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site- 
        packages/torch/nn/modules/activation.py(980)forward()
    -> return F.multi_head_attention_forward(
    (Pdb) l
    975                     key_padding_mask=key_padding_mask, need_weights=need_weights,
    976                     attn_mask=attn_mask, use_separate_proj_weight=True,
    977                     q_proj_weight=self.q_proj_weight, k_proj_weight=self.k_proj_weight,
    978                     v_proj_weight=self.v_proj_weight)
    979             else:
    980  ->             return F.multi_head_attention_forward(
    981                     query, key, value, self.embed_dim, self.num_heads,
    982                     self.in_proj_weight, self.in_proj_bias,
    983                     self.bias_k, self.bias_v, self.add_zero_attn,
    984                     self.dropout, self.out_proj.weight, self.out_proj.bias,
    985                     training=self.training,
    (Pdb) query.shape
    torch.Size([32, 37, 512])
    (Pdb) key.shape
    torch.Size([74, 36, 512])
    (Pdb) value.shape
    torch.Size([74, 36, 512])

(Pdb) u
    /alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py

    615  ->         tgt2 = self.src_attn(
    616                 tgt,
    617                 memory,
    618                 memory,
    619                 attn_mask=memory_mask,
    620                 key_padding_mask=memory_key_padding_mask,
    621             )[0]

(Pdb) tgt.shape
torch.Size([32, 37, 512])
(Pdb) tgt2.shape
torch.Size([32, 37, 512])

(Pdb) u
> /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/module.py(889)_call_impl()
-> result = self.forward(*input, **kwargs)

(Pdb) u
> /home/local/QCRI/ahussein/anaconda3/envs/k2/lib/python3.8/site-packages/torch/nn/modules/transformer.py(231)forward()
-> output = mod(output, memory, tgt_mask=tgt_mask,
(Pdb) ll
212         def forward(self, tgt: Tensor, memory: Tensor, tgt_mask: Optional[Tensor] = None,
213                     memory_mask: Optional[Tensor] = None, tgt_key_padding_mask: Optional[Tensor] = None,
214                     memory_key_padding_mask: Optional[Tensor] = None) -> Tensor:
215             r"""Pass the inputs (and mask) through the decoder layer in turn.
216
217             Args:
218                 tgt: the sequence to the decoder (required).
219                 memory: the sequence from the last layer of the encoder (required).
220                 tgt_mask: the mask for the tgt sequence (optional).
221                 memory_mask: the mask for the memory sequence (optional).
222                 tgt_key_padding_mask: the mask for the tgt keys per batch (optional).
223                 memory_key_padding_mask: the mask for the memory keys per batch (optional).
224
225             Shape:
226                 see the docs in Transformer class.
227             """
228             output = tgt
229
230             for mod in self.layers:
231  ->             output = mod(output, memory, tgt_mask=tgt_mask,
232                              memory_mask=memory_mask,
233                              tgt_key_padding_mask=tgt_key_padding_mask,
234                              memory_key_padding_mask=memory_key_padding_mask)

(Pdb) output.shape
torch.Size([32, 37, 512])
(Pdb) memory.shape
torch.Size([74, 36, 512])

(Pdb) u
 Args:
        memory:
            It's the output of the encoder with shape (T, N, C)

> /alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/transformer.py(303)decoder_forward()
-> pred_pad = self.decoder(
(Pdb) len(ys_in)
37
(Pdb) ys_in_pad.shape
torch.Size([37, 32])
(Pdb) tgt.shape
torch.Size([32, 37, 512])
(Pdb) memory.shape
torch.Size([74, 36, 512])

(Pdb) u
> /alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/train.py(397)compute_loss()
-> att_loss = mmodel.decoder_forward(
(Pdb) len(supervisions["text"])
37
(Pdb) supervisions["text"]
['أنا أعتقد المجتمع والأفراد جميعا', 'أهلا بكم مشاهدينا الأعزاء إلى هذه الحلقة من حديث الثورة', 'وكذلك إلى المشاهدين', 'لكن ذلك لم يكن واردا هنا أبدا', 'لما في عدوان بالطائرات كمان بدك ناس تقاوم', 'عندما أرى أختي في سوريا بتعاقب على اغتصابها 20 جندي', 'الأساس أنا كنت أتمنى إنه أيضا ضمن هذا البرنامج', 'يعني لا يوجد اقتناع لا يوجد تسويق من هو العدو', 'من السعادة طالما أن هذا المجتمع الدولي الظالم', 'يا أفندم افتح الباب عايزين شوي هواء بس يا ابن كذا', 'ويحرم بالمطلق على كل القوى السياسية', 'لذلك فإن الرئيس الأفغاني أشرف غني', 'هذه الحشوة هذه حشوة الحية', 'ولذلك من الأفضل أن يكون بقيادة', 'لم يكن الغرب في يوم من الأيام صديقا لنا', 'اليمن وعلى الأخ الرئيس عبد ربه منصور هادي', 'لا الأمين العام سلطته بسيط قوي', 'مش حسن نصر الله وإيران', 'دكتور فيصل', 'لو أنهم لم يضربوه لما مات', 'العمل على تطويق إسرائيل', 'استغلال نفوذ استيلاء على المال العام', 'تعرف القجقجشي المهرب مهربين صدام', 'النقطة الثانية بأن إسرائيل الآن وصلت', 'السياسي والوطني اليمني وها هو يعود إليهم', 'لا', 'هذه الجماعات أنا لا أقول أنها غير موجودة', 'ما أوجه الاختلاف في طريقة اختيار إبراهيم محلب', 'ولكن بعد أن بدأ العدو الحرب', 'كان في شباكين في العربية مقفولين', 'لا يحتاج إلى أي أحد', 'من سنة قال الكيماوي خط أحمر', 'مرحبا بكم ضيوفنا الكرام', 'ولسنا هنا واجهة أو كذا', 'في محافظة إدلب ماذا بقي منها', 'إسرائيل لا تستطيع أن تتجاهل كل ذلك', 'ولكن على المستوى الرسمي العربي']

(Pdb) len(supervisions['num_frames'])
37
(Pdb) supervisions['num_frames']
tensor([301, 300, 294, 293, 293, 291, 290, 290, 288, 287, 286, 282, 282, 280,
        278, 276, 275, 269,  56, 269, 268, 268, 267, 266, 264, 262, 261, 260,
        257, 257, 255, 251, 250, 246, 246, 244, 242], dtype=torch.int32)
(Pdb) len(supervisions['cut'])
37
(Pdb)  batch["inputs"].shape
torch.Size([36, 301, 80])

(Pdb) u
> /alt-arabic/speech/amir/k2/tmp/icefall/egs/mgb2_100h/ASR/conformer_ctc/train.py(499)train_one_epoch()
-> loss, loss_info = compute_loss(
(Pdb) batch_idx
4618
danpovey commented 2 years ago

If you lhotse is not quite up to date, perhaps you could update your Lhotse and try again (just the training)? Could be a bug that's already fixed. If not, please show your asr_datamodule.py.

pzelasko commented 2 years ago

I don't recall anything related to this being fixed; to me it looks like one of your cuts has two supervisions (or you used CutConcatenate transform). In these cases it's expected that features.shape[0] == num_cuts but supervisions['num_frames'].shape[0] == num_supervisions. Earlier k2-based recipes (e.g. snowfall) supported multiple supervisions in a single utterance, but I don't think that Icefall supports it anymore.

The relevant docs are here: https://github.com/lhotse-speech/lhotse/blob/1d7807025575fdaa96cb907c451db0fb0fd23cde/lhotse/dataset/speech_recognition.py#L13-L59

AmirHussein96 commented 2 years ago

Thank you guys for all your help. I updated my lhotse but my most recent update that worked => ignore the bad batch:

if batch['inputs'].shape[0] == len(batch["supervisions"]["text"]):
        do the training
else:
       continue;

I attached my asr_datamodule.py for your reference. Please let me know if you noticed anything suspicious. asr_datamodule.zip

[Updated] Tensor board after 4 epochs: https://tensorboard.dev/experiment/BRZvqsBsQu6JgrrOgY5jFQ/#scalars

danpovey commented 2 years ago

That asr_datamodule.py has CutConcatenate in it, that's likely the problem.

AmirHussein96 commented 2 years ago

I have an optional CutConcatenate similar to librispeech and I do not activate it: https://github.com/k2-fsa/icefall/blob/9a98e6ced6370e42f69a8d904ab66a481cfb4d6f/egs/librispeech/ASR/tdnn_lstm_ctc/asr_datamodule.py#L222

csukuangfj commented 2 years ago

How did you prepare your data?

Are you using a script similar to https://github.com/k2-fsa/icefall/blob/master/egs/librispeech/ASR/local/compute_fbank_librispeech.py ?

Are you using cutset.trim_to_supervisions() in your data preparation?

AmirHussein96 commented 2 years ago

@csukuangfj yes I am using script similar to librispeech compute_fbank_librispeech.py, and I also using the cutset.trim_to_supervisions(). Please check the prepare.sh and the compute_fbank_mgb2.py attached in prepare.zip file. prepare.zip

csukuangfj commented 2 years ago

compute_fbank_mgb2.py

There is no compute_fbank_mgb2.py in the attached zip file.

pzelasko commented 2 years ago

... as a side note, maybe it makes sense to add a separate script in each recipe that validates the major assumptions about the data (single supervision per cut, supervisions time bounds == cut time bounds, maybe sth text normalization related like compatibility of BPE model with the supervision texts, etc.)

AmirHussein96 commented 2 years ago

compute_fbank_mgb2.py

There is no compute_fbank_mgb2.py in the attached zip file.

@csukuangfj Please check this one prepare.zip

danpovey commented 2 years ago

Amir, in future it might be easier to show us your code by creating a pull request, that way we can easily see it from github.

csukuangfj commented 2 years ago

compute_fbank_mgb2.py

There is no compute_fbank_mgb2.py in the attached zip file.

@csukuangfj Please check this one prepare.zip

I don't see any issues in compute_fbank_mgb2.py.


... as a side note, maybe it makes sense to add a separate script in each recipe that validates the major assumptions about the data (single supervision per cut, supervisions time bounds == cut time bounds, maybe sth text normalization related like compatibility of BPE model with the supervision texts, etc.)

That is a good idea. I will make a PR about it.

danpovey commented 2 years ago

That asr_datamodule.py has CutConcatenate in it, that's likely the problem.

On Sat, Apr 23, 2022 at 7:15 AM Amir Hussein @.***> wrote:

Thank you guys for all your help. I updated my lhotse but my most recent update that worked => ignore the bad batch:

if batch['inputs'].shape[0] == len(batch["supervisions"]["text"]): do the training else: continue;

I attached my asr_datamodule.py for your reference. Please let me know if you noticed anything suspicious. asr_datamodule.zip https://github.com/k2-fsa/icefall/files/8544736/asr_datamodule.zip

Also I passed the first epoch you can see the tensor board here https://tensorboard.dev/experiment/u5WFy44QRRyKQYvRI9WxVg/#scalars

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/320#issuecomment-1107025466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZXKDKE7RLRRWKMNQDVGMXJHANCNFSM5TXIWVQA . You are receiving this because you were mentioned.Message ID: @.***>

AmirHussein96 commented 2 years ago

That asr_datamodule.py has CutConcatenate in it, that's likely the problem. On Sat, Apr 23, 2022 at 7:15 AM Amir Hussein @.> wrote: Thank you guys for all your help. I updated my lhotse but my most recent update that worked => ignore the bad batch: if batch['inputs'].shape[0] == len(batch["supervisions"]["text"]): do the training else: continue; I attached my asr_datamodule.py for your reference. Please let me know if you noticed anything suspicious. asr_datamodule.zip https://github.com/k2-fsa/icefall/files/8544736/asr_datamodule.zip Also I passed the first epoch you can see the tensor board here https://tensorboard.dev/experiment/u5WFy44QRRyKQYvRI9WxVg/#scalars — Reply to this email directly, view it on GitHub <#320 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZXKDKE7RLRRWKMNQDVGMXJHANCNFSM5TXIWVQA . You are receiving this because you were mentioned.Message ID: @.>

Hi Dan, Apologize for the late reply, in my PR https://github.com/k2-fsa/icefall/pull/396 I disabled the CutConcatenate with --concatenate-cuts False by default. Also I resolved all issues, should I close this issue?

csukuangfj commented 2 years ago

So what was the reason?

AmirHussein96 commented 2 years ago

So what was the reason?

I think the issue was mainly cause by some segments that had empty text. I just ignored the minibatches that have inconsistent dimensions.