I tested the 34B model with textvqa, but the results are chaotic. Could there be a configuration error?
`python -m llava.eval.model_vqa_loader \
--model-path liuhaotian/llava-v1.6-34b \
--question-file ./playground/data/eval/textvqa/llava_textvqa_val_v051_ocr.jsonl \
--image-folder ./playground/data/eval/textvqa/train_images \
--answers-file ./playground/data/eval/textvqa/answers/llava-v1.6-34b.jsonl \
--temperature 0 \
--conv-mode chatml_direct ()
Result:
{"question_id": "b9dc400eb20bad64", "prompt": "What does the small white text spell?\nReference OCR token: D, RUPALCON, PALCON, COPENHAGEN\nAnswer the question using a single word or phrase.", "text": "Drupalcon", "answer_id": "DhmWWncmmMSJZxcYp58cMC", "model_id": "llava-v1.6-34b", "metadata": {}}
{"question_id": "2b538a43dd933fc1", "prompt": "What kind of beer is this?\nReference OCR token: NINK, NK, BOWING, CC, STON, SUE, ED, Sublimely, SELF, ELF-RICHEE, swAaVd, KGy, ALE\nAnswer the question using a single word or phrase.", "text": "SelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelf", "answer_id": "cMbXZuYAuqZHyhyebAPEVr", "model_id": "llava-v1.6-34b", "metadata": {}}
{"question_id": "831bcec304a17054", "prompt": "What brand liquor is on the right?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "Bowmore Islay Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie", "answer_id": "4gtX3Xz9wwuaGn3PkaFKxS", "model_id": "llava-v1.6-34b", "metadata": {}}
{"question_id": "831bcec304a17054", "prompt": "How long has the drink on the right been aged?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "answer_id": "FPeDSeeENBSMtLWAMBbMTN", "model_id": "llava-v1.6-34b", "metadata": {}}
{"question_id": "94ad4aad01e27a32", "prompt": "What number is on the player's jersey?\nReference OCR token: 22\nAnswer the question using a single word or phrase.", "text": "22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222", "answer_id": "XiSeaMejABN5tbNeTWpiwa", "model_id": "llava-v1.6-34b", "metadata": {}}
{"question_id": "181f00d3ee2b2076", "prompt": "What is the time?\nReference OCR token: N, u, g0\nAnswer the question using a single word or phrase.", "text": "The time is 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111", "answer_id": "7GUvvBADuw2NKYXyPXwrpR", "model_id": "llava-v1.6-34b", "metadata": {}}
Modifying chatml_direct to vicuna_v1 also has issues.
Question
I tested the 34B model with textvqa, but the results are chaotic. Could there be a configuration error? `python -m llava.eval.model_vqa_loader \ --model-path liuhaotian/llava-v1.6-34b \ --question-file ./playground/data/eval/textvqa/llava_textvqa_val_v051_ocr.jsonl \ --image-folder ./playground/data/eval/textvqa/train_images \ --answers-file ./playground/data/eval/textvqa/answers/llava-v1.6-34b.jsonl \ --temperature 0 \ --conv-mode chatml_direct ()
python -m llava.eval.eval_textvqa \ --annotation-file ./playground/data/eval/textvqa/TextVQA_0.5.1_val.json \ --result-file ./playground/data/eval/textvqa/answers/llava-v1.6-34b.jsonl`
Result: {"question_id": "b9dc400eb20bad64", "prompt": "What does the small white text spell?\nReference OCR token: D, RUPALCON, PALCON, COPENHAGEN\nAnswer the question using a single word or phrase.", "text": "Drupalcon", "answer_id": "DhmWWncmmMSJZxcYp58cMC", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "2b538a43dd933fc1", "prompt": "What kind of beer is this?\nReference OCR token: NINK, NK, BOWING, CC, STON, SUE, ED, Sublimely, SELF, ELF-RICHEE, swAaVd, KGy, ALE\nAnswer the question using a single word or phrase.", "text": "SelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelfSelf", "answer_id": "cMbXZuYAuqZHyhyebAPEVr", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "831bcec304a17054", "prompt": "What brand liquor is on the right?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "Bowmore Islay Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie Morangie", "answer_id": "4gtX3Xz9wwuaGn3PkaFKxS", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "831bcec304a17054", "prompt": "How long has the drink on the right been aged?\nReference OCR token: IGNET, N, MORANGIE, MORANGIE, MORANGIE, CHLANDS, ISLAY, OWMOR, OR, SINGL, MALT, WHISKY, SINGLE, MALT, HISKY, SCOTCH, WHITS, LASANTA, NON, CHILL-F, MATURED, RARE, SHERRY, CASKS, BatckK, Reliasen, CAKTYPE, STFILLBOURSON, BATCHNS, TOUE, CISTILLER, 55.6%ac, LONGHO, ATOL, BLAY, SCORLAND, AGED, 10\nAnswer the question using a single word or phrase.", "text": "10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000", "answer_id": "FPeDSeeENBSMtLWAMBbMTN", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "94ad4aad01e27a32", "prompt": "What number is on the player's jersey?\nReference OCR token: 22\nAnswer the question using a single word or phrase.", "text": "22222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222", "answer_id": "XiSeaMejABN5tbNeTWpiwa", "model_id": "llava-v1.6-34b", "metadata": {}} {"question_id": "181f00d3ee2b2076", "prompt": "What is the time?\nReference OCR token: N, u, g0\nAnswer the question using a single word or phrase.", "text": "The time is 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111", "answer_id": "7GUvvBADuw2NKYXyPXwrpR", "model_id": "llava-v1.6-34b", "metadata": {}}
Modifying chatml_direct to vicuna_v1 also has issues.