Closed jc-hou closed 3 years ago
That example is unfortunately unmaintained. Have you tried playing around with LXMERT, which is also a multi-modal model? There is a demo available here.
Oh, I didn't know there is one with LXMERT. I will try that. Thanks.
You can make it work with a little inference call modification. Add "return_dict": False to inputs dict. Like this:
inputs = {
"input_ids": batch[0],
"input_modal": batch[2],
"attention_mask": batch[1],
"modal_start_tokens": batch[3],
"modal_end_tokens": batch[4],
"return_dict": False
}
outputs = model(**inputs)
Hi, I tried to run the multimodal example. By running:
I met the following error message:
torch:1.7.1 transformers:4.0.1
I tried with torch:1.7.0, transformers:4.1.0, also failed with the same error. Any adivce? Thanks.