Open Johere opened 4 weeks ago
Hi @Johere, we are reproducing this issue. We will update here for any progress :)
Hi @Johere , we have updated our llava example for llava-hf/llava-1.5-7b-hf
. Please follow the instructions in the latest llava example to see if it works.
If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)
Hi @Johere , we have updated our llava example for
llava-hf/llava-1.5-7b-hf
. Please follow the instructions in the latest llava example to see if it works.If the issue continues, could you please share the scripts you're using to run the multi-turn chat, along with the output from our env-check scripts, to help us gather more details? :)
Hi @JinheTang Thanks for your reply. The problem still exists. To reproduce the problem I met, please modify several lines of the latest llava example:
diff --git a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
index b70e22541a..c3b35ee2d8 100644
--- a/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
+++ b/python/llm/example/GPU/PyTorch-Models/Model/llava/generate.py
@@ -56,8 +56,23 @@ if __name__ == '__main__':
{"type": "image"},
{"type": "text", "text": prompt}
]
+ },
+ # mimic a multi-round chat
+ {
+ 'role': 'assistant',
+ 'content': [
+ {'type': 'text', 'text': 'The image features a young girl holding a stuffed teddy bear.'}
+ ]
+ },
+ {
+ "role": "user",
+ "content": [
+ {"type": "image"},
+ {"type": "text", "text": "Describe the differences between these two images."}
+ ]
}
]
+
text = processor.apply_chat_template(messages, add_generation_prompt=True)
if os.path.exists(image_path):
@@ -65,7 +80,10 @@ if __name__ == '__main__':
else:
image = Image.open(requests.get(image_path, stream=True).raw)
- inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+ # inputs = processor(text=text, images=image, return_tensors="pt").to('xpu')
+ # multi-image chat debug
+ image_2 = Image.open(requests.get("http://farm5.staticflickr.com/4031/4440753665_631134eaa4_z.jpg", stream=True).raw)
+ inputs = processor(text=text, images=[image, image_2], return_tensors="pt").to('xpu')
Env check output log is attached: env-check.txt
Hi @Johere , thanks for the script, we will try to reproduce it.
Hi @Johere , we have reproduced the issue. If there's any update we will let you know.
Hi @Johere,
Sorry for the late reply. We have fixed this bug in ipex-llm>=2.2.0b20241113
. You could have a try with latest ipex-llm
:)
Please let us know for any further problems.
Multi-turn chat is like:
1st-round: http://farm6.staticflickr.com/5268/5602445367_3504763978_z.jpg What is this?
2nd-round: http://farm5.staticflickr.com/4031/4440753665_631134eaa4_z.jpg What are the differences between these two images?
Error logs:
The error is located as: /usr/lib/python3.10/site-packages/ipex_llm/transformers/low_bit_linear.py :729
x_2d = x.view(-1, x_shape[-1])
If I modify as:
x_2d = x.contiguous().view(-1, x_shape[-1])
, everything will be OK. I think the issue is related to LLaVA model's vision_feature_select_strategy (vision_feature_select_strategy=default
) which may make the tensor discontiguous.Can anyone help on this issue? Thanks!
Python packages: ipex-llm 2.2.0b20241011 transformers 4.45.2