sooftware / kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.
https://sooftware.github.io/kospeech/
Apache License 2.0
603 stars 191 forks source link

RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001] #117

Closed dh7ahn closed 3 years ago

dh7ahn commented 3 years ago

KsponSpeech 데이터로 RNN transducer 모델을 학습시켜보려고 하는데요, 첫번째 epoch 시작 직후 아래와 같은 에러가 발생합니다. 뭐가 잘못 되었을까요?

RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001]

character vocab을 사용하였고, LAS나 transformer 모델 학습에서는 이러한 문제가 없습니다.

아래에 실행 로그 첨부합니다.

[2021-02-26 11:47:05,284][kospeech.utils][INFO] - audio: audio_extension: pcm sample_rate: 16000 frame_length: 20 frame_shift: 10 normalize: true del_silence: true feature_extract_by: kaldi time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false transform_method: fbank n_mels: 80 freq_mask_para: 18 audio_extension: pcm transform_method: fbank sample_rate: 16000 frame_length: 20 frame_shift: 10 n_mels: 80 normalize: true del_silence: true feature_extract_by: kaldi freq_mask_para: 18 time_mask_num: 4 freq_mask_num: 2 spec_augment: true input_reverse: false model: architecture: rnnt teacher_forcing_ratio: 1.0 teacher_forcing_step: 0.01 min_teacher_forcing_ratio: 0.9 dropout: 0.3 bidirectional: true joint_ctc_attention: false max_len: 400 num_encoder_layers: 4 num_decoder_layers: 1 encoder_hidden_state_dim: 320 decoder_hidden_state_dim: 512 output_dim: 512 rnn_type: lstm encoder_dropout_p: 0.2 decoder_dropout_p: 0.2 architecture: rnnt num_encoder_layers: 4 num_decoder_layers: 1 encoder_hidden_state_dim: 320 decoder_hidden_state_dim: 512 output_dim: 512 rnn_type: lstm bidirectional: true encoder_dropout_p: 0.2 decoder_dropout_p: 0.2 train: dataset: kspon dataset_path: /data/KsponSpeech transcripts_path: ../../../data/transcripts.txt output_unit: character batch_size: 32 save_result_every: 1000 checkpoint_every: 5000 print_every: 10 mode: train num_workers: 4 use_cuda: true init_lr_scale: 0.01 final_lr_scale: 0.05 max_grad_norm: 400 weight_decay: 1.0e-05 seed: 777 resume: false optimizer: adam init_lr: 1.0e-06 final_lr: 1.0e-06 peak_lr: 0.0001 warmup_steps: 400 num_epochs: 20 reduction: mean label_smoothing: 0.1 lr_scheduler: tri_stage_lr_scheduler

[2021-02-26 11:47:05,419][kospeech.utils][INFO] - Operating System : Linux 3.10.0-957.10.1.el7.x86_64 [2021-02-26 11:47:05,420][kospeech.utils][INFO] - Processor : x86_64 [2021-02-26 11:47:05,449][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,450][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,451][kospeech.utils][INFO] - device : Tesla P40 [2021-02-26 11:47:05,452][kospeech.utils][INFO] - CUDA is available : True [2021-02-26 11:47:05,452][kospeech.utils][INFO] - CUDA version : 10.1 [2021-02-26 11:47:05,452][kospeech.utils][INFO] - PyTorch version : 1.7.1+cu101 [2021-02-26 11:47:05,459][kospeech.utils][INFO] - split dataset start !! [2021-02-26 11:47:11,514][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:13,673][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:15,848][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:17,873][kospeech.utils][INFO] - Applying Spec Augmentation... [2021-02-26 11:47:19,916][kospeech.utils][INFO] - split dataset complete !! DataParallel( (module): RNNTransducer( (encoder): EncoderRNNT( (rnn): LSTM(80, 320, num_layers=4, batch_first=True, dropout=0.2, bidirectional=True) (out_proj): Linear( (linear): Linear(in_features=640, out_features=512, bias=True) ) ) (decoder): DecoderRNNT( (embedding): Embedding(2001, 512) (rnn): LSTM(512, 512, batch_first=True, dropout=0.2) (out_proj): Linear( (linear): Linear(in_features=512, out_features=512, bias=True) ) ) (fc): Linear( (linear): Linear(in_features=1024, out_features=2001, bias=False) ) ) ) [2021-02-26 11:47:20,263][kospeech.utils][INFO] - start [2021-02-26 11:47:20,263][kospeech.utils][INFO] - Epoch 0 start Traceback (most recent call last): File "/home/a77/workspace/ext/KoSpeech/bin/main.py", line 174, in main last_model_checkpoint = train(config) File "/home/a77/workspace/ext/KoSpeech/bin/main.py", line 134, in train model = trainer.train( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 160, in train model, train_loss, train_cer = self._train_epoches( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 255, in _train_epoches output, loss, ctc_loss, cross_entropy_loss = self._model_forward( File "/data/KoSpeech/kospeech/trainer/supervised_trainer.py", line 431, in _model_forward outputs = model(inputs, input_lengths, targets, target_lengths) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, outputs) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/a77/workspace/env/miniconda/lib/python3.8/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 1 has invalid shape [4, 926, 126, 2001], but expected [4, 1668, 126, 2001]

sooftware commented 3 years ago

Multi-GPU 트레이닝을 하신 것 같네요. 제가 일이 많아서 로그를 자세히 보지는 못하는 점 양해 부탁드립니다.
KoSpeech 몇몇 모델의 경우 멀티지피유 트레이닝시에 에러가 나는 경우가 있습니다. 어떤 모델인지 코멘트 달아주시면 여유가 생기면 수정해보겠습니다.

dh7ahn commented 3 years ago

P40x8, CUDA 10.1, Pytorch 1.7.1, RNN transducer입니다 (model=rnn train=rnnt_train)

sooftware commented 3 years ago

너무 늦은 답변 죄송합니다. RNN Transducer의 경우 멀티 GPU가 현재 지원이 되지 않는 것 같네요. 간단한 에러겠지만 제가 최근에 개인적으로 바빠서 수정을 못하고 있습니다. 다른 모델을 사용해보시는게 어떠실까요?