mlfoundations / open_flamingo

An open-source framework for training large multimodal models.
MIT License
3.74k stars 284 forks source link

Notes on FP16 inference #130

Open 152334H opened 1 year ago

152334H commented 1 year ago

For no reason at all, I decided to try running the model on my RTX 3090. This turned out to be surprisingly difficult, so I am documenting my process here for people to search up.

Although the README provides some instructions on how to load the model,

I have created a simple fork of this repo here that is slightly easier to use for people with consumer GPUs. Unfortunately, it still consumes a large amount of vram (because of the lack of 8bit), and it takes a long time to load the llama weights (because accelerate is broken)

ericjang commented 1 year ago

I followed the fork https://github.com/mlfoundations/open_flamingo/commit/25b17319723b41f900cd52a389466b97c053695d but am unable to load the Llama weights on the 3090 - crashes when I try to load the llama weights in fp16

NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7 
ericjang commented 1 year ago

Never mind, I was able to get things working. Had to decrease batch size to 2 to have sufficient memory.

Note also that one has to cast the inputs to half-precision, e.g. batch_images = batch_images.half()

Performance on the OK-VQA benchmark with the following settings:

python open_flamingo/eval/evaluate.py \
    --lm_path $LM_PATH \
    --lm_tokenizer_path $LM_TOKENIZER_PATH \
    --checkpoint_path $CKPT_PATH \
    --device $DEVICE \
    --cross_attn_every_n_layers 4 \
    --eval_ok_vqa \
    --ok_vqa_image_dir_path $VQAV2_IMG_PATH \
    --ok_vqa_annotations_json_path $VQAV2_ANNO_PATH \
    --ok_vqa_questions_json_path $VQAV2_QUESTION_PATH \
    --results_file $RESULTS_FILE \
    --num_samples 5000 --shots 2 --num_trials 1 \
    --batch_size 2
Shots 2 Trial 0 OK-VQA score: 35.15
Shots 2 Mean OK-VQA score: 35.15