Closed DianeBouchacourt closed 1 year ago
Thanks ! Otherwise I just found about
torch_transforms.PILToTensor()
https://discuss.pytorch.org/t/getting-typeerror-default-collate-batch-must-contain-tensors-numpy-arrays-numbers-dicts-or-lists-found-class-pil-image-image/161703
And this seems to return the same pixels values !
This way you don't have to play with collate_fn, looks like it doesn't scale the values (contrarily to ToTensor()) https://pytorch.org/vision/main/generated/torchvision.transforms.PILToTensor.html
Nevermind, looks like the images are still of different sizes :( So I will follow your collate_fn trick, thanks
Would you have an example command to launch Flava on VGA with main scripts?
I think something like in here should work. e.g.
model=flava
for dataset in VG_Relation VG_Attribution COCO_Order Flickr30k_order
do
python3 main_aro.py --dataset=$dataset --model-name=$model --device=cuda
done
Thanks !
Btw do you plan on updating the number for all datasets now that the model.eval() has been fixed? Also, can you confirm you filter for VGA attributes with more than 25 examples? If that's the case, it looks like only 7099 examples are left out of the 28 748 reported in the paper..
Hello,
Since flava model has no image_preprocess (None), call to
getitem
will return PIL.Image objects, which can't be batched. Thus runningmodel.get_retrieval_scores_batched
with flava on for example VGR returns the error:default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found <class 'PIL.Image.Image'>
How did you run your experiments?
Btw, unfortunately, I think turning PIL.Image into tensor with
torchvision.transforms.ToTensor()
messes up with the preprocessing, see:The returned sum of pixels values are respectively