Closed ruifengma closed 3 months ago
Hi, @ruifengma ,
Sorry we have only supported evaluation on GPUs with 80G memories now (we will adapt to other low-profile GPUs very soon). A quick fix is to remove L111 in vlmeval/vlm/internvl_chat.py
and add device_map='auto'
in AutoModel.from_pretrained
Thanks @kennymckormick for the reply, it actually can be loaded onto 2 GPUs, but when inferencing, I got new issue
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:16<00:00, 1.52s/it]
/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py:122: UserWarning: Following kwargs received: {'do_sample': False, 'max_new_tokens': 1024, 'top_p': None, 'num_beams': 1}, will use as generation config.
warnings.warn(f'Following kwargs received: {self.kwargs}, will use as generation config. ')
0%| | 0/2374 [00:00<?, ?it/s]/data/mrx/VLMEvalKit/vlmeval/vlm/base.py:140: UserWarning: Model InternVLChat does not support interleaved input. Will use the first image and aggregated texts as prompt.
warnings.warn(
dynamic ViT batch size: 1
0%| | 0/2374 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/data/mrx/VLMEvalKit/run.py", line 155, in <module>
main()
File "/data/mrx/VLMEvalKit/run.py", line 79, in main
model = infer_data_job(
^^^^^^^^^^^^^^^
File "/data/mrx/VLMEvalKit/vlmeval/inference.py", line 164, in infer_data_job
model = infer_data(
^^^^^^^^^^^
File "/data/mrx/VLMEvalKit/vlmeval/inference.py", line 130, in infer_data
response = model.generate(message=struct, dataset=dataset_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/mrx/VLMEvalKit/vlmeval/vlm/base.py", line 135, in generate
return self.generate_inner(message, dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 220, in generate_inner
return self.generate_v1_5(message, dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/mrx/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 201, in generate_v1_5
response = self.model.chat(self.tokenizer, pixel_values=pixel_values,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5/modeling_internvl_chat.py", line 309, in chat
generation_output = self.generate(
^^^^^^^^^^^^^^
File "/root/miniconda3/envs/vlmeval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/huggingface/modules/transformers_modules/InternVL-Chat-V1-5/modeling_internvl_chat.py", line 353, in generate
input_embeds[selected] = vit_embeds.reshape(-1, C)
~~~~~~~~~~~~^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
L197 in vlmeval/vlm/internvl_chat.py
the image data is loaded into a default GPU. Try to check the GPU id before loading the data.
L197 in
vlmeval/vlm/internvl_chat.py
the image data is loaded into a default GPU. Try to check the GPU id before loading the data.
Thanks @junming-yang , the image script I found is
pixel_values = load_image(image_path, max_num=self.max_num).cuda().to(torch.bfloat16)
Do I need to dynamically check the id or force it to GPU 0 ? Since the model loading process is on 0 first then 1 (0 and 1 are projected by 2 and 3)
You can try to dynamically check the model's device.
You can try to dynamically check the model's device.
I appended .to(torch.cuda.current_device())
at the end, it still give me the same error
Maybe you can try .to(self.model.device)
.
Maybe you can try
.to(self.model.device)
.
Yes, I did. Still the same
I have tried to reproduce your bug. This is the revised code for running (from original code L107-110):
self.model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16,
trust_remote_code=True,
load_in_8bit=load_in_8bit, device_map='auto').eval()
# if not load_in_8bit:
# self.model = self.model.to(device)
Each GPU is allocated about 26 GiB. And no error is reported. Please check your code.
I have tried to reproduce your bug. This is the revised code for running (from original code L107-110):
self.model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True, load_in_8bit=load_in_8bit, device_map='auto').eval() # if not load_in_8bit: # self.model = self.model.to(device)
Each GPU is allocated about 26 GiB. And no error is reported. Please check your code.
I actually did not DIY but completely following the advice. I use two A40 GPUs for the task, I checked and did the same modification as you did
Not coding issue, update the latest version of official internvl configuration file solve
When I try to run internVL-Chat-V1.5, since it is large and need at lease two GPUs, therefore, I use the following command to run
CUDA_VISIBLE_DEVICES=2,3 python run.py --data MME --model InternVL-Chat-V1-5 --verbose
But it does not run on the two GPUs but only one and give OOM error, do I need to give more configuration?