m-bain / frozen-in-time

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
https://arxiv.org/abs/2104.00650
MIT License
351 stars 44 forks source link

"img should be PIL Image" when fine-tuning on MSR-VTT #6

Closed bryant1410 closed 3 years ago

bryant1410 commented 3 years ago

I got the following error when trying to run python train.py --config configs/msrvtt_4f_i21k.json (as in the README):

  File "***/base/base_dataset.py", line 107, in __getitem__
    imgs = self.transforms(imgs)
  File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__
    img = t(img)
  File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/transforms.py", line 195, in __call__
    return F.resize(img, self.size, self.interpolation)
  File "***/envs/frozen/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 229, in resize
    raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
TypeError: img should be PIL Image. Got <class 'torch.Tensor'>

(I set up the env as described in the README)

Seems like the frames are obtained as torch tensors but then the transforms need a PIL Image:

https://github.com/m-bain/frozen-in-time/blob/e6fc9467d62db4ab560822414447822aa857c4a6/base/base_dataset.py#L294-L296

https://github.com/m-bain/frozen-in-time/blob/e6fc9467d62db4ab560822414447822aa857c4a6/base/base_dataset.py#L279-L281

If I add a transforms.ToPILImage() before (and a transforms.ToTensor() after) in here

https://github.com/m-bain/frozen-in-time/blob/e6fc9467d62db4ab560822414447822aa857c4a6/data_loader/transforms.py#L18-L23

it still doesn't work because it needs an image, not multiple images. It also makes me think that the transforms actually won't work when having multiple PIL images.

Seems like the transforms are the incorrect ones? Or am I missing something?

bryant1410 commented 3 years ago

Wait, I think it's because I ended up with a previous torchvision version. Lemme see this before checking out this issue.

bryant1410 commented 3 years ago

Confirmed.