Error of bert - Githubissues

@iyuge2 @Columbine21 Hi, have a problem when I run run.py file. Hope to get your help

The error is located in BertTextEncoder.py :

text:torch.Size([64, 39, 768]) input_ids:torch.Size([64, 768]) | input_mask:torch.Size([64, 768]) | segment_ids: torch.Size([64, 768])

https://github.com/thuiar/MMSA/blob/b2e70bbd198ba8e8dc041f5e059c3baa2027b34a/models/subNets/BertTextEncoder.py#L62-L64

Error in line 62, I still don't know how to solve this problem.

Detailed error message:

opt/conda/conda-bld/pytorch/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [79,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Traceback (most recent call last):
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA/MMSA-old/trains/singleTask/MISA.py", line 66, in do_train
    outputs = model(text, audio, vision)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA/MMSA-old/models/singleTask/MISA.py", line 281, in forward
    output = self.alignment(text, audio, video)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA/MMSA-old/models/singleTask/MISA.py", line 195, in alignment
    bert_output = self.bertmodel(text) # [batch_size, seq_len, 768]
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA/MMSA-old/models/subNets/BertTextEncoder.py", line 68, in forward
    token_type_ids=segment_ids.to('cuda'))[0]  # Models outputs are now tuples
ib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
    output = input.matmul(weight.t())
RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch/work/aten/src/THC/THCGeneral.cpp:216

I know the further reason, the problem still lies in data processing. In here: https://github.com/thuiar/MMSA/blob/b2e70bbd198ba8e8dc041f5e059c3baa2027b34a/models/subNets/BertTextEncoder.py#L47-L55

The previous version: The data's text shape is torch.Size([64, 39, 768])
The latest version: The pkl your provide shape of text is torch.Size([32, 3, 39]). The pkl generated by DataPre.py, the text shape is torch.Size([32, 39, 3])

Unfortunately, changing the dimensions still does not solve the problem. If I use text = text.permute(0,2,1)to correct dimension.

When I run slef_mm, the error is

raise ValueError(msg_err.format(type_err, X.dtype))
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

When I run misa, the error is

File "models/singleTask/MISA.py", line 198, in alignment
masked_output = torch.mul(bert_sent_mask.unsqueeze(2), bert_output)
RuntimeError: The size of tensor a (3) must match the size of tensor b (39) at non-singleton dimension 1

thuiar / MMSA

Error of bert #23