scutcsq / DWFormer

DWFormer: Dynamic Window Transformer for Speech Emotion Recognition(ICASSP 2023 Oral)
54 stars 3 forks source link

Conv1D Issue in IEMOCAP Training Script #5

Closed basavarajsh98 closed 12 months ago

basavarajsh98 commented 1 year ago

Hi,

I've thoroughly reviewed your repo and your recent paper, and I'm intrigued by the proposed approach. However, it seems that there might be an issue in the train.py script when running it for the IEMOCAP dataset. Specifically, the use of conv1d with a 4D input and filter, as well as the tuple (0, 1) for padding, is causing a runtime error.

~/DWFormer/DWFormer$ python3 ./IEMOCAP/train.py
2023-11-25 21:59:02 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
Traceback (most recent call last):
  File "./IEMOCAP/train.py", line 109, in <module>
    out = model(datas, mask)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/DWFormer/DWFormer/IEMOCAP/model.py", line 72, in forward
    x2,thresholds1,attn11 = self.dt1(x1, x_mask, attn)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/DWFormer/DWFormer/IEMOCAP/utils/DWFormerBlock.py", line 27, in forward
    attention_mask,mappingmask,attention_mask2,thresholds,lengths,wise = self.mask_generation_function(haltingscore, x, mask,threshold = 0.5, lambdas= 0.85)
  File "/DWFormer/DWFormer/IEMOCAP/utils/DWFormerBlock.py", line 92, in mask_generation_function
    x4 = F.conv1d(x4, b, padding=(0, 1)).view(batch, -1)
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 32, 324]

I noticed the following code snippet:

x4 = x3.view(1, 1, batch, token_length)
b = Variable(torch.ones((1, 1, 1, 2), device=x.device))
x4 = F.conv1d(x4, b, padding=(0, 1)).view(batch, -1)

The error message indicates that conv1d expects a 2D (unbatched) or 3D (batched) input, but in this case, the input size is [1, 1, 32, 324]. Additionally, the padding argument should be a string, a single number, or a one-element tuple.

I'm wondering how the exact code worked for you. Please help me in this regard.

Thanks,

scutcsq commented 1 year ago

Hi,

I've thoroughly reviewed your repo and your recent paper, and I'm intrigued by the proposed approach. However, it seems that there might be an issue in the train.py script when running it for the IEMOCAP dataset. Specifically, the use of conv1d with a 4D input and filter, as well as the tuple (0, 1) for padding, is causing a runtime error.

~/DWFormer/DWFormer$ python3 ./IEMOCAP/train.py
2023-11-25 21:59:02 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
Traceback (most recent call last):
  File "./IEMOCAP/train.py", line 109, in <module>
    out = model(datas, mask)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/DWFormer/DWFormer/IEMOCAP/model.py", line 72, in forward
    x2,thresholds1,attn11 = self.dt1(x1, x_mask, attn)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/python/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/DWFormer/DWFormer/IEMOCAP/utils/DWFormerBlock.py", line 27, in forward
    attention_mask,mappingmask,attention_mask2,thresholds,lengths,wise = self.mask_generation_function(haltingscore, x, mask,threshold = 0.5, lambdas= 0.85)
  File "/DWFormer/DWFormer/IEMOCAP/utils/DWFormerBlock.py", line 92, in mask_generation_function
    x4 = F.conv1d(x4, b, padding=(0, 1)).view(batch, -1)
RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 32, 324]

I noticed the following code snippet:

x4 = x3.view(1, 1, batch, token_length)
b = Variable(torch.ones((1, 1, 1, 2), device=x.device))
x4 = F.conv1d(x4, b, padding=(0, 1)).view(batch, -1)

The error message indicates that conv1d expects a 2D (unbatched) or 3D (batched) input, but in this case, the input size is [1, 1, 32, 324]. Additionally, the padding argument should be a string, a single number, or a one-element tuple.

I'm wondering how the exact code worked for you. Please help me in this regard.

Thanks,

Thanks for your question. The error is caused by different Pytorch version. If your Pytorch version is higher than 1.8, the code may cause error here. You can change the "x4 = F.conv1d(x4, b, padding = (0,1)).view(batch, -1)" into "x4 = F.conv2d(x4, b, padding = (0,1)).view(batch, -1)" and the result is the same.

basavarajsh98 commented 11 months ago

Thanks!