Open 152334H opened 1 year ago
I followed the fork https://github.com/mlfoundations/open_flamingo/commit/25b17319723b41f900cd52a389466b97c053695d but am unable to load the Llama weights on the 3090 - crashes when I try to load the llama weights in fp16
NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7
Never mind, I was able to get things working. Had to decrease batch size to 2 to have sufficient memory.
Note also that one has to cast the inputs to half-precision, e.g. batch_images = batch_images.half()
Performance on the OK-VQA benchmark with the following settings:
python open_flamingo/eval/evaluate.py \
--lm_path $LM_PATH \
--lm_tokenizer_path $LM_TOKENIZER_PATH \
--checkpoint_path $CKPT_PATH \
--device $DEVICE \
--cross_attn_every_n_layers 4 \
--eval_ok_vqa \
--ok_vqa_image_dir_path $VQAV2_IMG_PATH \
--ok_vqa_annotations_json_path $VQAV2_ANNO_PATH \
--ok_vqa_questions_json_path $VQAV2_QUESTION_PATH \
--results_file $RESULTS_FILE \
--num_samples 5000 --shots 2 --num_trials 1 \
--batch_size 2
Shots 2 Trial 0 OK-VQA score: 35.15
Shots 2 Mean OK-VQA score: 35.15
For no reason at all, I decided to try running the model on my RTX 3090. This turned out to be surprisingly difficult, so I am documenting my process here for people to search up.
Although the README provides some instructions on how to load the model,
precision=...
parameter fromopen_clip
is partially broken due to a forced fp32 conversion in LayerNormFP32accelerate
discards the injectedFlamingoLMMixin
whendevice_map
is used,I have created a simple fork of this repo here that is slightly easier to use for people with consumer GPUs. Unfortunately, it still consumes a large amount of vram (because of the lack of 8bit), and it takes a long time to load the llama weights (because accelerate is broken)