roatienza / efficientspeech

PyTorch code implementation of EfficientSpeech - to be presented at ICASSP2023.
Apache License 2.0
153 stars 26 forks source link

Error in networks.py #8

Open ppisljar opened 1 year ago

ppisljar commented 1 year ago

Hello,

i am running into this error:

File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/fit_loop.py", line 354, in advance self.epoch_loop.run(self._data_fetcher) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 133, in run self.advance(data_fetcher) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/training_epoch_loop.py", line 218, in advance batch_output = self.automatic_optimization.run(trainer.optimizers[0], kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 185, in run self._optimizer_step(kwargs.get("batch_idx", 0), closure) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 260, in _optimizer_step call._call_lightning_module_hook( File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 140, in _call_lightning_module_hook output = fn(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1256, in optimizer_step optimizer.step(closure=optimizer_closure) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/core/optimizer.py", line 155, in step step_output = self._strategy.optimizer_step(self._optimizer, closure, kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 225, in optimizer_step return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/plugins/precision/amp.py", line 70, in optimizer_step closure_result = closure() File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 140, in call self._result = self.closure(*args, kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 126, in closure step_output = self._step_fn() File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/loops/optimization/automatic.py", line 307, in _training_step training_step_output = call._call_strategy_hook(trainer, "training_step", kwargs.values()) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py", line 287, in _call_strategy_hook output = fn(args, kwargs) File "/opt/conda/lib/python3.10/site-packages/lightning/pytorch/strategies/strategy.py", line 367, in training_step return self.model.training_step(*args, kwargs) File "/home/tts/efficientspeech/model.py", line 214, in training_step y_hat = self.forward(x) File "/home/tts/efficientspeech/model.py", line 156, in forward return self.phoneme2mel(x, train=True) if self.training else self.predict_step(x) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/tts/efficientspeech/layers/networks.py", line 421, in forward pred = self.encoder(x, train=train) File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, **kwargs) File "/home/tts/efficientspeech/layers/networks.py", line 370, in forward fused_features = torch.cat([fused_features, pitch_features, \ RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 1 but got size 110 for tensor number 1 in the list. Epoch 0: 0%| | 0/129 [00:02<?, ?it/s]

ppisljar commented 1 year ago

File "/home/tts/efficientspeech/model.py", line 156, in forward │ return self.phoneme2mel(x, train=True) if self.training else self.predict_step(x) │ File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │ return forward_call(*args, *kwargs) │ File "/home/tts/efficientspeech/layers/networks.py", line 425, in forward │ pred = self.encoder(x, train=train) │ File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl │ return forward_call(args, **kwargs) │ File "/home/tts/efficientspeech/layers/networks.py", line 354, in forward │ pitch_features = pitch_features.masked_fill(mask, 0) │ RuntimeError: The size of tensor a (143) must match the size of tensor b (110) at non-singleton dimension 1

seems the 143 comes from max_phoneme_len