Model I am using (ListenAttendSpell, Transformer, Conformer ...): conformer_transducer
The problem arises when using:
CUDA_VISIBLE_DEVICES=8 python ./openspeech_cli/hydra_train.py dataset=libri dataset.dataset_path=/data/dataset/Libri/LibriSpeech dataset.dataset_download=False dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt vocab=libri_subword vocab.vocab_size=5000 vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset model=conformer_transducer audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu trainer.batch_size=4 criterion=transducer
Error executing job with overrides: ['dataset=libri', 'dataset.dataset_path=/data/dataset/Libri/LibriSpeech', 'dataset.dataset_download=False', 'dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt', 'vocab=libri_subword', 'vocab.vocab_size=5000', 'vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset', 'model=conformer_transducer', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'trainer.batch_size=4', 'criterion=transducer']
Traceback (most recent call last):
File "./openspeech_cli/hydra_train.py", line 51, in hydra_main
trainer.fit(model, data_module)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 458, in fit
self._run(model)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 756, in _run
self.dispatch()
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 797, in dispatch
self.accelerator.start_training(self)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 96, in start_training
self.training_type_plugin.start_training(trainer)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 144, in start_training
self._results = trainer.run_stage()
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 807, in run_stage
return self.run_train()
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 869, in run_train
self.train_loop.run_training_epoch()
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 499, in run_training_epoch
batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 738, in run_training_batch
self.optimizer_step(optimizer, opt_idx, batch_idx, train_step_and_backward_closure)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 442, in optimizer_step
using_lbfgs=is_lbfgs,
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py", line 1403, in optimizer_step
optimizer.step(closure=optimizer_closure)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 214, in step
self.__optimizer_step(*args, closure=closure, profiler_name=profiler_name, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/core/optimizer.py", line 134, in __optimizer_step
trainer.accelerator.optimizer_step(optimizer, self._optimizer_idx, lambda_closure=closure, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 329, in optimizer_step
self.run_optimizer_step(optimizer, opt_idx, lambda_closure, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 336, in run_optimizer_step
self.training_type_plugin.optimizer_step(optimizer, lambda_closure=lambda_closure, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 193, in optimizer_step
optimizer.step(closure=lambda_closure, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/optim/adam.py", line 66, in step
loss = closure()
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 733, in train_step_and_backward_closure
split_batch, batch_idx, opt_idx, optimizer, self.trainer.hiddens
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 823, in training_step_and_backward
result = self.training_step(split_batch, batch_idx, opt_idx, hiddens)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 290, in training_step
training_step_output = self.trainer.accelerator.training_step(args)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 204, in training_step
return self.training_type_plugin.training_step(*args)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/plugins/training_type/dp.py", line 98, in training_step
return self.model(*args, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 77, in forward
output = super().forward(*inputs, **kwargs)
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/pytorch_lightning/overrides/base.py", line 46, in forward
output = self.module.training_step(*inputs, **kwargs)
File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/conformer_transducer/model.py", line 110, in training_step
return super(ConformerTransducerModel, self).training_step(batch, batch_idx)
File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 268, in training_step
target_lengths=target_lengths,
File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/models/openspeech_transducer_model.py", line 90, in collect_outputs
target_lengths=target_lengths.int(),
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/gpu/WorkSpace/Speech/OpenSpeech/openspeech/criterion/transducer/transducer.py", line 96, in forward
gather=self.gather,
File "/home/gpu/anaconda3/envs/openspeech/lib/python3.7/site-packages/warp_rnnt-0.4.0-py3.7-linux-x86_64.egg/warp_rnnt/__init__.py", line 74, in rnnt_loss
index[:, :, :U-1, 1] = labels.unsqueeze(dim=1)
RuntimeError: The expanded size of the tensor (61) must match the existing size (62) at non-singleton dimension 2. Target sizes: [4, 355, 61]. Tensor sizes: [4, 1, 62]
The Loss function:
def rnnt_loss(log_probs: torch.FloatTensor,
labels: torch.IntTensor,
frames_lengths: torch.IntTensor,
labels_lengths: torch.IntTensor,
average_frames: bool = False,
reduction: Optional[AnyStr] = None,
blank: int = 0,
gather: bool = False) -> torch.Tensor:
The CUDA-Warp RNN-Transducer loss.
Args:
log_probs (torch.FloatTensor): Input tensor with shape (N, T, U, V)
where N is the minibatch size, T is the maximum number of
input frames, U is the maximum number of output labels and V is
the vocabulary of labels (including the blank).
labels (torch.IntTensor): Tensor with shape (N, U-1) representing the
reference labels for all samples in the minibatch.
frames_lengths (torch.IntTensor): Tensor with shape (N,) representing the
number of frames for each sample in the minibatch.
labels_lengths (torch.IntTensor): Tensor with shape (N,) representing the
length of the transcription for each sample in the minibatch.
average_frames (bool, optional): Specifies whether the loss of each
sample should be divided by its number of frames.
Default: False.
reduction (string, optional): Specifies the type of reduction.
Default: None.
blank (int, optional): label used to represent the blank symbol.
Default: 0.
gather (bool, optional): Reduce memory consumption.
Default: False.
The log_probs and labels of model output and function requirements are inconsistent
Environment info
Information
Model I am using (ListenAttendSpell, Transformer, Conformer ...): conformer_transducer
The problem arises when using: CUDA_VISIBLE_DEVICES=8 python ./openspeech_cli/hydra_train.py dataset=libri dataset.dataset_path=/data/dataset/Libri/LibriSpeech dataset.dataset_download=False dataset.manifest_file_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset/libri_subword_manifest.txt vocab=libri_subword vocab.vocab_size=5000 vocab.vocab_path=/home/gpu/WorkSpace/Speech/OpenSpeech/dataset model=conformer_transducer audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu trainer.batch_size=4 criterion=transducer
The Loss function:
The log_probs and labels of model output and function requirements are inconsistent