microsoft / LLaVA-Med

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Other
1.29k stars 148 forks source link

llava/eval/model_vqa.py results incorrectly during inference #64

Open veinhao opened 2 months ago

veinhao commented 2 months ago

I use llava/eval/model_vqa.py for Q&A, inputting one of the data selected from llava_med_eval_qa50_qa.jsonl that is for Evaluation, including the corresponding image provided. for example: INPUT1: {"question_id": 17, "image": "20406459_F5.jpg", "pair_id": "20406459_F5", "text": "What type of tumor is mentioned in the context?\n", "gpt4_answer": "The tumor mentioned in the context is a renal cell carcinoma (RCC), which is a type of kidney cancer. It is specifically the clear cell type and has a Fuhrman grade of 3. The stage of the tumor is T3bNxM1, indicating that it has advanced and may have spread to other parts of the body.", ...} INPUT2: 20406459_F5.jpg

OUTPUT(same question, loop 3 times): {"question_id": 17, "prompt": "What type of tumor is mentioned in the context?\n", "text": "The context mentions a giant perineal schwannoma. Schwannomas are benign tumors that arise from Schwann cells, which are the cells that form the myelin sheath around peripheral nerves. In this case, the tumor is located in the perineal region, which is the area between the genitals and the anus.", ...} {"question_id": 17, "prompt": "What type of tumor is mentioned in the context?\n", "text": "The context mentions a giant cell tumor, which is a type of bone tumor.",...} {"question_id": 17, "prompt": "What type of tumor is mentioned in the context?\n", "text": "The context mentions an adrenal tumor, which is a growth or mass in the adrenal gland. The adrenal glands are small, triangular-shaped glands located on top of both kidneys. They produce hormones that help regulate various bodily functions, such as metabolism, immune system, blood pressure, and stress response.", ...}

So the question is: is such an output reasonable?