mertyg / vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
MIT License
222 stars 14 forks source link

Requirements (e.g. torch versions) #16

Closed DianeBouchacourt closed 1 year ago

DianeBouchacourt commented 1 year ago

Hi,

I tried running the XVLM model (via get_model(model_name="xvlm-coco", device="cuda", root_dir="./tmp/") in the notebook). I get the error

CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())`

I have Python 3.9 torch==1.13.0+cu117 torchvision==0.14.0+cu117

Could you please share your configuration so that I can try to match it?

mertyg commented 1 year ago

Ah sorry about this. Here are the versions:

Python 3.9.0
torch==1.12.1
torchvision==0.13.1

with cuda version 11.6.

I have a very ugly pip freeze output now but I'll try to clean it up later and share it.

DianeBouchacourt commented 1 year ago

Thanks a lot ! Do you ever use XVLM with is_pretrained=True by the way? What does it do exactly if we go in this loop? https://github.com/mertyg/vision-language-models-are-bows/blob/09e1fcffea60b7e8f31f93cb0844b344af1d0642/model_zoo/xvlm_utils/xvlm.py#L211

mertyg commented 1 year ago

Ah no great catch, I don't think we currently use this.

This was probably part of an attempt to use XVLM right after pretraining (before finetuning on retrieval), I think, but I don't precisely recall the deets

DianeBouchacourt commented 1 year ago

Sounds good, thanks !