open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.08k stars 154 forks source link

Multi-GPU inference issue #224

Closed ruifengma closed 3 months ago

ruifengma commented 3 months ago

When I try to run internVL-Chat-V1.5, since it is large and need at lease two GPUs, therefore, I use the following command to run CUDA_VISIBLE_DEVICES=2,3 python run.py --data MME --model InternVL-Chat-V1-5 --verbose But it does not run on the two GPUs but only one and give OOM error, do I need to give more configuration?

kennymckormick commented 3 months ago

Hi, @ruifengma , Sorry we have only supported evaluation on GPUs with 80G memories now (we will adapt to other low-profile GPUs very soon). A quick fix is to remove L111 in vlmeval/vlm/internvl_chat.py and add device_map='auto' in AutoModel.from_pretrained

ruifengma commented 3 months ago

Thanks @kennymckormick for the reply, it actually can be loaded onto 2 GPUs, but when inferencing, I got new issue

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:16<00:00,  1.52s/it]
/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py:122: UserWarning: Following kwargs received: {'do_sample': False, 'max_new_tokens': 1024, 'top_p': None, 'num_beams': 1}, will use as generation config.
  warnings.warn(f'Following kwargs received: {self.kwargs}, will use as generation config. ')
  0%|                                                                                                                                                                           | 0/2374 [00:00<?, ?it/s]/data/mrx/VLMEvalKit/vlmeval/vlm/base.py:140: UserWarning: Model InternVLChat does not support interleaved input. Will use the first image and aggregated texts as prompt.
  warnings.warn(
dynamic ViT batch size: 1
  0%|                                                                                                                                                                           | 0/2374 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/data/mrx/VLMEvalKit/run.py", line 155, in <module>
    main()
  File "/data/mrx/VLMEvalKit/run.py", line 79, in main
    model = infer_data_job(
            ^^^^^^^^^^^^^^^
  File "/data/mrx/VLMEvalKit/vlmeval/inference.py", line 164, in infer_data_job
    model = infer_data(
            ^^^^^^^^^^^
  File "/data/mrx/VLMEvalKit/vlmeval/inference.py", line 130, in infer_data
    response = model.generate(message=struct, dataset=dataset_name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/mrx/VLMEvalKit/vlmeval/vlm/base.py", line 135, in generate
    return self.generate_inner(message, dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 220, in generate_inner
    return self.generate_v1_5(message, dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 201, in generate_v1_5
    response = self.model.chat(self.tokenizer, pixel_values=pixel_values,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5/modeling_internvl_chat.py", line 309, in chat
    generation_output = self.generate(
                        ^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/vlmeval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5/modeling_internvl_chat.py", line 353, in generate
    input_embeds[selected] = vit_embeds.reshape(-1, C)
    ~~~~~~~~~~~~^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
junming-yang commented 3 months ago

L197 in vlmeval/vlm/internvl_chat.py the image data is loaded into a default GPU. Try to check the GPU id before loading the data.

ruifengma commented 3 months ago

L197 in vlmeval/vlm/internvl_chat.py the image data is loaded into a default GPU. Try to check the GPU id before loading the data.

Thanks @junming-yang , the image script I found is pixel_values = load_image(image_path, max_num=self.max_num).cuda().to(torch.bfloat16) Do I need to dynamically check the id or force it to GPU 0 ? Since the model loading process is on 0 first then 1 (0 and 1 are projected by 2 and 3)

junming-yang commented 3 months ago

You can try to dynamically check the model's device.

ruifengma commented 3 months ago

You can try to dynamically check the model's device.

I appended .to(torch.cuda.current_device()) at the end, it still give me the same error

junming-yang commented 3 months ago

Maybe you can try .to(self.model.device).

ruifengma commented 3 months ago

Maybe you can try .to(self.model.device).

Yes, I did. Still the same

junming-yang commented 3 months ago

I have tried to reproduce your bug. This is the revised code for running (from original code L107-110):

self.model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16,
                                       trust_remote_code=True,
                                       load_in_8bit=load_in_8bit, device_map='auto').eval()
# if not load_in_8bit:
#     self.model = self.model.to(device)

Each GPU is allocated about 26 GiB. And no error is reported. Please check your code.

ruifengma commented 3 months ago

I have tried to reproduce your bug. This is the revised code for running (from original code L107-110):

self.model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16,
                                       trust_remote_code=True,
                                       load_in_8bit=load_in_8bit, device_map='auto').eval()
# if not load_in_8bit:
#     self.model = self.model.to(device)

Each GPU is allocated about 26 GiB. And no error is reported. Please check your code.

I actually did not DIY but completely following the advice. I use two A40 GPUs for the task, I checked and did the same modification as you did

ruifengma commented 3 months ago

Not coding issue, update the latest version of official internvl configuration file solve