microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.29k stars 148 forks source link

Response in messy encoding for RAD-VQA evaluation #53

Closed YatingPan closed 3 months ago

YatingPan commented 3 months ago

Hi, I ran the model_vqa_med.py script on my CPU (changed .cuda() to .to('cpu) and float16 to float32, with the llama-2-7B weights applied with data_RAD-9epoch_delta.zip. But the response I got with CXR image is in messy encoding like:

{"question_id": "1", "prompt": "Can you describe the image for me?\n", "text": "nobody \u043d\u0438schluss\uc5b4 alberga\u0435\u0433\u043e \u043a\u043e\u043c\u0438();` \u0444\u0435\u0432 everybody selects \u0444\u0435\u0432ixel", "answer_id": "chrEpZiAn2zV6XaUMbrxNP", "model_id": "/home/user/yatpan/yatpan/LLaVA-Med/llava-rad-vqa-model", "metadata": {}}

Does anyone know how to solve this? The delta weights application and inference didn't raise any errors.

prophesierimposing commented 3 months ago

Only llama-1 works.

YatingPan commented 3 months ago

Only llama-1 works.

Thanks!