haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.29k stars 2.12k forks source link

The LLAVA-V1.6-34B outputs garbled sentences. #1284

Closed yanbai1993 closed 6 months ago

yanbai1993 commented 6 months ago

Question

I tested the 34B model with textvqa, but the results are chaotic. Could there be a configuration error? `python -m llava.eval.model_vqa_loader \ --model-path liuhaotian/llava-v1.6-34b \ --question-file ./playground/data/eval/textvqa/llava_textvqa_val_v051_ocr.jsonl \ --image-folder ./playground/data/eval/textvqa/train_images \ --answers-file ./playground/data/eval/textvqa/answers/llava-v1.6-34b.jsonl \ --temperature 0 \ --conv-mode chatml_direct ()

python -m llava.eval.eval_textvqa \ --annotation-file ./playground/data/eval/textvqa/TextVQA_0.5.1_val.json \ --result-file ./playground/data/eval/textvqa/answers/llava-v1.6-34b.jsonl`

Result: {"question_id": "b9dc400eb20bad64", "prompt": "What does the small white text spell?\nReference OCR token: D, RUPALCON, PALCON, COPENHAGEN\nAnswer the question using a single word or phrase.", "text": "Drupalcon", "answer_id": "DhmWWncmmMSJZxcYp58cMC", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "2b538a43dd933fc1", "prompt": "What kind of beer is this?\nReference OCR token: NINK, NK, BOWING, CC, STON, SUE, ED, Sublimely, SELF, ELF-RICHEE, swAaVd, KGy, ALE\nAnswer the question using a single word or phrase.", "text": "SelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelf", "answer_id": "cMbXZuYAuqZHyhyebAPEVr", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "831bcec304a17054", "prompt": "What brand liquor is on the right?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "Bowmore Islay Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie", "answer_id": "4gtX3Xz9wwuaGn3PkaFKxS", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "831bcec304a17054", "prompt": "How long has the drink on the right been aged?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "answer_id": "FPeDSeeENBSMtLWAMBbMTN", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "94ad4aad01e27a32", "prompt": "What number is on the player's jersey?\nReference OCR token: 22\nAnswer the question using a single word or phrase.", "text": "22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222", "answer_id": "XiSeaMejABN5tbNeTWpiwa", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "181f00d3ee2b2076", "prompt": "What is the time?\nReference OCR token: N, u, g0\nAnswer the question using a single word or phrase.", "text": "The time is 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111", "answer_id": "7GUvvBADuw2NKYXyPXwrpR", "model_id": "llava-v1.6-34b", "metadata": {}}

Modifying chatml_direct to vicuna_v1 also has issues.

yanbai1993 commented 6 months ago

I have solved the issue by updating transformer to 4.36.2.