Closed xumingze0308 closed 6 months ago
Hi,
I found the error that I set the wrong model_dir
. To double confirm with you, in the eval.sh
, I should set the model_dir
to the folder of pertained lava-v1.6 and the weight_dir
to my own fine-tuned folder (the lora weights), am I right?
Another question is that the evaluation give warning UserWarning: do_sample is set to False. However, top_p is set to 0.9 -- this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p.
This is normal and should reproduce your results, right?
Thank you!
Hi, For the first question, directly passing in the MODELS/pllava-7b should be fine, as long as you downloaded it from huggingface. The demo and evaluation shares a loading function here https://github.com/magic-research/PLLaVA/blob/fd9194ae55750c2e1ac677056f6286c126eda580/tasks/eval/model_utils.py#L39-L125
So the weights could be loaded from two sources:
I think as long as the weights are loaded from one above, it should be fine. Loading from the downloaded weights, you should see "
For the second question, the UserWarning is also seen in our evaluation, so it's safe by far.
BTW, I just fixed a response postprocess bug at https://github.com/magic-research/PLLaVA/commit/fd9194ae55750c2e1ac677056f6286c126eda580. So you might consider keeping up with the newest code. The former code might leave a leading space in the answer and it seems that chatgpt evaluation is sensitive to this leading space in the response (vcg score 3.10 v.s. 3.03 for pllava-7b with lora alpha set to 4).
Thank you very much for this clarification! I have another question about LoRA alpha. I found that the default training config uses 32 lora_alpha
but evaluation uses 4 instead. After I changed the evaluation's lora_alpha
to 32, the performance dropped a lot (e.g., MSVD ~77% -> ~73%). Did you observe the same thing?
Thank you very much for this clarification! I have another question about LoRA alpha. I found that the default training config uses 32
lora_alpha
but evaluation uses 4 instead. After I changed the evaluation'slora_alpha
to 32, the performance dropped a lot (e.g., MSVD ~77% -> ~73%). Did you observe the same thing?
I noticed the same thing the lora_alpha is not consistent between the training and inference stages. As shown in Fig.9, the author claims that using lower alpha in the test stage achieves better performance.
Hi,
When I evaluate the model on videoqabench, the model doesn't generate any answer but the prompt only here. The bug is very similar to this closed issue, but I still have this issue after adding your new commits. Can you please take a look at it? Thanks!