haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.04k stars 2.21k forks source link

[Usage] Is there a way to describe one image and the next? #605

Open min731 opened 1 year ago

min731 commented 1 year ago

Describe the issue

Thanks for your great work!!

I want to load the model and input multiple images to save the description for each image. Is there a way to load the model and then complete the description for one image and then have it describe for the next one?

----- file 1 inference ------ ASSISTANT: Yes ASSISTANT: Sunset, beach, ocean, car, man, woman, boy, girl, handbag, backpack, cell phone. ASSISTANT: A family photo taken in front of a sea of waves. inference_outputs : ['Yes', 'Sunset, beach, ocean, car, man, woman, boy, girl, handbag, backpack, cell phone.', 'A family photo taken in front of a sea of waves.'] ------ file 2 inference ------ [2023-10-18 17:06:24,340] ERROR in app: Exception on /album_registration_pc1/images_analysis_pc1 [POST] Traceback (most recent call last): File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/flask/app.py", line 1455, in wsgi_app response = self.full_dispatch_request() File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/flask/app.py", line 869, in full_dispatch_request rv = self.handle_user_exception(e) File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/flask/app.py", line 867, in full_dispatch_request rv = self.dispatch_request() File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/flask/app.py", line 852, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(view_args) File "/home/meta-3/jm/META_FINAL_PJT/jm_PC1/Family_Album_PC1/views/album_registration_views_pc1.py", line 27, in images_analysis_task_pc1 inference_outputs = image_processor_pc1.llava_inference_image( File "/home/meta-3/jm/META_FINAL_PJT/jm_PC1/Family_Album_PC1/models/image_processor_pc1.py", line 28, in llava_inference_image inference_outputs = inference_image(tokenizer, File "/home/meta-3/jm/META_FINAL_PJT/jm_PC1/Family_Album_PC1/models/LLaVA/llava/serve/cli.py", line 149, in inference_image output_ids = model.generate( File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample outputs = self( File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/meta-3/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/meta-3/jm/META_FINAL_PJT/jm_PC1/Family_Album_PC1/models/LLaVA/llava/model/language_model/llava_llama.py", line 75, in forward input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images) File "/home/meta-3/jm/META_FINAL_PJT/jm_PC1/Family_Album_PC1/models/LLaVA/llava/model/llava_arch.py", line 129, in prepare_inputs_labels_for_multimodal cur_image_features = image_features[cur_image_idx] IndexError: index 1 is out of bounds for dimension 0 with size 1

I had the same error as above.

wanxingDaze commented 1 year ago

Hello, I also encountered the same problem when running multiple images, I wonder if you have solved it

File "/home/llmnav/kunyuwangv2/LLaVA/llava/serve/main.py", line 177, in main(args) File "/home/llmnav/kunyuwangv2/LLaVA/llava/serve/main.py", line 146, in main output_ids = model.generate( File "/home/llmnav/anaconda3/envs/llava/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/llmnav/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate return self.sample( File "/home/llmnav/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/generation/utils.py", line 2642, in sample outputs = self( File "/home/llmnav/anaconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/llmnav/anaconda3/envs/llava/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/llmnav/kunyuwangv2/LLaVA/llava/model/language_model/llava_llama.py", line 75, in forward input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images) File "/home/llmnav/kunyuwangv2/LLaVA/llava/model/llava_arch.py", line 141, in prepare_inputs_labels_for_multimodal cur_image_features = image_features[cur_image_idx] IndexError: index 1 is out of bounds for dimension 0 with size 1

min731 commented 1 year ago

Yeah, that's the problem I'm running into. Now, I'm trying to use 'LLaVA/llava/serve/cli.py' file to have the model describe multiple images each, but it's hard to implement.

adrielkuek commented 1 year ago

Not sure if this is still an issue, but the fix to this is to simply reset the conversation at the start of the loop so that it doesn't continuously collect and append chat histories. Reset using this line: conv = conv_templates[args.conv_mode].copy() will do the job.

min731 commented 12 months ago

Oh, my God. It was just a matter of resetting the conv. Thank you so much. God bless you...

anas-zafar commented 3 months ago

@adrielkuek where exactly did you replace this? By doing this won't we lose the contextual information? Thanks