Closed findalexli closed 1 year ago
Hello @findalexli ,
For the evaluation on MMHal-Bench, please check the script Eval/eval_scripts/eval_mmhal.sh
which calls model_vqa_mmhal.py
and builds the dataset directly from HuggingFace datasets. In Eval/eval_scripts
you may find other useful examples.
I don't quite follow your second question. Would you mind elaborating your question?
Best, Shengcao
Hello there, I was trying to reproduce the llava bench and MMHAL-BENCH result. I saw under eval/eval_scripts that the eval_image folder was passed but it doesn't exist in the repo. Did a little digging, from the llava repo, it looks like there is only a llava-bench-in-the-wild dataset, but not the 90 question-answers.
On the same topic of reproducing the results, I saw that a Q-LORA weights were used in the evaluation. If I would like to reproduce the official RLHF/SFT weights that are just LORA, I am assuming I can just swap them out?