Fail when running the multimodal example

jc-hou commented 3 years ago

Hi, I tried to run the multimodal example. By running:

python run_mmimdb.py \
    --data_dir ../dataset/ \
    --model_name_or_path bert-base-uncased \
    --output_dir ../output \
    --do_train \
    --do_eval \
    --max_seq_len 512 \
    --gradient_accumulation_steps 20 \
    --num_image_embeds 3 \
    --num_train_epochs 100 \
    --patience 5 \
    --overwrite_output_dir

I met the following error message:

12/27/2020 16:01:33 - INFO - __main__ -   ***** Running training *****
12/27/2020 16:01:33 - INFO - __main__ -     Num examples = 15513
12/27/2020 16:01:33 - INFO - __main__ -     Num Epochs = 100
12/27/2020 16:01:33 - INFO - __main__ -     Instantaneous batch size per GPU = 8
12/27/2020 16:01:33 - INFO - __main__ -     Total train batch size (w. parallel, distributed & accumulation) = 160
12/27/2020 16:01:33 - INFO - __main__ -     Gradient Accumulation steps = 20
12/27/2020 16:01:33 - INFO - __main__ -     Total optimization steps = 9700
Epoch:   0%|                                                                                                                                         | 0/100 [00:00<?, ?it/s/data/stars/user/jhou/collection-stars/anaconda3/envs/pytorch_stable_171_pip_cu102/lib/python3.6/site-packages/PIL/Image.py:2837: DecompressionBombWarning: Image size (96592500 pixels) exceeds limit of 89478485 pixels, could be decompression bomb DOS attack.
  DecompressionBombWarning,
Iteration:   0%|                                                                                                                                    | 0/1940 [00:02<?, ?it/s]
Epoch:   0%|                                                                                                                                         | 0/100 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "run_mmimdb.py", line 572, in <module>
    main()
  File "run_mmimdb.py", line 525, in main
    global_step, tr_loss = train(args, train_dataset, model, tokenizer, criterion)
  File "run_mmimdb.py", line 151, in train
    outputs = model(**inputs)
  File "/data/stars/user/jhou/collection-stars/anaconda3/envs/pytorch_stable_171_pip_cu102/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data/stars/user/jhou/collection-stars/anaconda3/envs/pytorch_stable_171_pip_cu102/lib/python3.6/site-packages/transformers/models/mmbt/modeling_mmbt.py", line 366, in forward
    return_dict = return_dict if return_dict is not None else self.config.use_return_dict
  File "/data/stars/user/jhou/collection-stars/anaconda3/envs/pytorch_stable_171_pip_cu102/lib/python3.6/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'MMBTForClassification' object has no attribute 'config'

torch:1.7.1 transformers:4.0.1

I tried with torch:1.7.0, transformers:4.1.0, also failed with the same error. Any adivce? Thanks.

LysandreJik commented 3 years ago

That example is unfortunately unmaintained. Have you tried playing around with LXMERT, which is also a multi-modal model? There is a demo available here.

jc-hou commented 3 years ago

Oh, I didn't know there is one with LXMERT. I will try that. Thanks.

slanj commented 3 years ago

You can make it work with a little inference call modification. Add "return_dict": False to inputs dict. Like this:

inputs = {
                "input_ids": batch[0],
                "input_modal": batch[2],
                "attention_mask": batch[1],
                "modal_start_tokens": batch[3],
                "modal_end_tokens": batch[4],
                "return_dict": False
            }
outputs = model(**inputs)

huggingface / transformers

Fail when running the multimodal example #9318