Question about DataPre.py

leijue222 commented 3 years ago

Hi, @iyuge2 Thank you for your contribution to this project. I downloaded the data from the address you provided and ran it. It did achieve the same result as result-stat.

Then I download SIMS|MOSI|MOSEI raw data and use DatePre.py to generate features.pkl. But the processed data cannot be used for run.py.

Let's take MOSI as an example(I download the raw data and use DataPre.py to process):

The file you provided are aligned_50.pkl(367.3MB) and unaligned_50.pkl(554.2MB).
I use DatePre.py to generate *.pkl for the MOSI dataset. Got features.pkl 2.8G, which is really bigger than your *.pkl.

Then I use features.pkl to run, but it failed...:sob:

Failure situation:

Error of 'list' object has no attribute 'astype'.

Then I change https://github.com/thuiar/MMSA/blob/b2e70bbd198ba8e8dc041f5e059c3baa2027b34a/data/load_data.py#L37-L39

to
self.labels = {
       'M': np.array(data[self.mode][self.args.train_mode+'_labels'], dtype=np.float32)
 }

But it didn't work, we will get a new error of : RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCGeneral.cpp:216 There must be a problem with the DataPre.py generate data. I really don't know how to solve it.

For the DataPre.py file, I only modified the path so that the data can be found for processing, and I have not changed other places. @iyuge2 Please help me...

leijue222 commented 3 years ago

I also tried SIMS, but the result is the same. failed... It may be wrong with bert, but I don't know exactly where bert is wrong

leijue222 commented 3 years ago

100%|██████████████████████████████████████| 43/43 [00:00<00:00, 126.09it/s]
  0%|                                                                                                                                    | 0/43 [00:00<?, ?it/s]
/opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCTensorIndex.cu:361: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [66,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
  0%|                                                                                                                                    | 0/43 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "run.py", line 300, in <module>
    worker()
  File "run.py", line 254, in worker
    run_normal(args)
  File "run.py", line 174, in run_normal
    test_results = run(args)
  File "run.py", line 79, in run
    atio.do_train(model, dataloader)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA-master/trains/multiTask/SELF_MM.py", line 143, in do_train
    outputs = model(text, (audio, audio_lengths), (vision, vision_lengths))
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA-master/models/AMIO.py", line 50, in forward
    return self.Model(text_x, audio_x, video_x)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA-master/models/multiTask/SELF_MM.py", line 64, in forward
    text = self.text_model(text)[:,0,:]
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/yiwei-01/project/Emotion/MMSA-master/models/subNets/BertTextEncoder.py", line 59, in forward
    token_type_ids=segment_ids)[0]  # Models outputs are now tuples
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/transformers/modeling_bert.py", line 734, in forward
    encoder_attention_mask=encoder_extended_attention_mask,
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/transformers/modeling_bert.py", line 407, in forward
    hidden_states, attention_mask, head_mask[i], encoder_hidden_states, encoder_attention_mask
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/transformers/modeling_bert.py", line 368, in forward
    self_attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/transformers/modeling_bert.py", line 314, in forward
    hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/transformers/modeling_bert.py", line 216, in forward
    mixed_query_layer = self.query(hidden_states)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
    return F.linear(input, self.weight, self.bias)
  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/functional.py", line 1371, in linear
    output = input.matmul(weight.t())
RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCGeneral.cpp:216

leijue222 commented 3 years ago

If I try to use cpu, the error is here: https://github.com/thuiar/MMSA/blob/b2e70bbd198ba8e8dc041f5e059c3baa2027b34a/models/multiTask/SELF_MM.py#L64

with error of:

  File "/media/yiwei/600G/anaconda3/envs/MMSA/lib/python3.6/site-packages/torch/nn/functional.py", line 1467, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 1355 out of table with 1 rows. at /opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:237

leijue222 commented 3 years ago

I have succeeded with your previous version. If I apply the old version that ran successfully to the current version, it still fails. Or conversely, applying the new version of MISA to the old one is also a failure. And the error message is the same: RuntimeError: cublas runtime error : resource allocation failed at /opt/conda/conda-bld/pytorch_1565272279342/work/aten/src/THC/THCGeneral.cpp:216

leijue222 commented 3 years ago

The error is located in BertTextEncoder.py : https://github.com/thuiar/MMSA/blob/b2e70bbd198ba8e8dc041f5e059c3baa2027b34a/models/subNets/BertTextEncoder.py#L47-L65

text:torch.Size([64, 39, 768]) | input_ids:torch.Size([64, 768]) | input_mask:torch.Size([64, 768]) | segment_ids: torch.Size([64, 768])

Error in line 62, I still don't know how to solve this problem. @iyuge2 @Columbine21

leijue222 commented 3 years ago

This issue has too much redundant information, I put the simplified content after debugging in issue#23.

thuiar / MMSA

Question about DataPre.py #22