microsoft / SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"
https://arxiv.org/abs/2111.13196
MIT License
237 stars 34 forks source link

Inference error with CPU #16

Open rorobertostring opened 2 years ago

rorobertostring commented 2 years ago

Hello guys, thank you for your amazing work and code.

I have replicated the container environment within a conda env. Everything works fine; the inference code works well with cuda, however when I set the device to cpu (in models/table1/vatex/log/args.json) I occur in the following error:

Traceback (most recent call last):
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 231, in <module>
    main(args)
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 226, in main
    inference(args, args.test_video_fname, vl_transformer, tokenizer, tensorizer)
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 99, in inference
    outputs = model(**inputs)
  File "~/miniconda3/envs/swinbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/SwinBERT/src/modeling/video_captioning_e2e_vid_swin_bert.py", line 53, in forward
    video_attention = (1. - diag_mask)*learn_att
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The inference command line is the following: python src/tasks/run_caption_VidSwinBert_inference.py --resume_checkpoint models/table1/vatex/best-checkpoint/model.bin --eval_model_dir models/table1/vatex/best-checkpoint/ --test_video_fname docs/G0mjFqytJt4_000152_000162.mp4 --do_lower_case --do_test

The relevant packages within the virtual environment are as follows:

python                    3.8.5
pytorch                   1.8.0
torchvision              0.9.0

However, by editing line 52 of src/modeling/video_captioning_e2e_vid_swin_bert.py with

            if kwargs['attention_mask'].is_cuda:
                diag_mask = torch.diag(torch.ones(vid_att_len)).cuda()
            else:
                diag_mask = torch.diag(torch.ones(vid_att_len))

everything works fine even with cpu.

Waiting for a feedback, thank you for the awesome work

kevinlin311tw commented 2 years ago

Thank you for pointing out this bug. We were mainly using gpu for inference and didn't check with cpu.

Please feel free to submit a PR if you would like.

tiesanguaixia commented 1 year ago

Hello guys, thank you for your amazing work and code.

I have replicated the container environment within a conda env. Everything works fine; the inference code works well with cuda, however when I set the device to cpu (in models/table1/vatex/log/args.json) I occur in the following error:

Traceback (most recent call last):
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 231, in <module>
    main(args)
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 226, in main
    inference(args, args.test_video_fname, vl_transformer, tokenizer, tensorizer)
  File "src/tasks/run_caption_VidSwinBert_inference.py", line 99, in inference
    outputs = model(**inputs)
  File "~/miniconda3/envs/swinbert/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "~/SwinBERT/src/modeling/video_captioning_e2e_vid_swin_bert.py", line 53, in forward
    video_attention = (1. - diag_mask)*learn_att
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The inference command line is the following: python src/tasks/run_caption_VidSwinBert_inference.py --resume_checkpoint models/table1/vatex/best-checkpoint/model.bin --eval_model_dir models/table1/vatex/best-checkpoint/ --test_video_fname docs/G0mjFqytJt4_000152_000162.mp4 --do_lower_case --do_test

The relevant packages within the virtual environment are as follows:

python                    3.8.5
pytorch                   1.8.0
torchvision              0.9.0

However, by editing line 52 of src/modeling/video_captioning_e2e_vid_swin_bert.py with

            if kwargs['attention_mask'].is_cuda:
                diag_mask = torch.diag(torch.ones(vid_att_len)).cuda()
            else:
                diag_mask = torch.diag(torch.ones(vid_att_len))

everything works fine even with cpu.

Waiting for a feedback, thank you for the awesome work

Hi! May I ask how to download the raw videos of VATEX?