pipilurj / bootstrapped-preference-optimization-BPO

code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"
Apache License 2.0
45 stars 1 forks source link

Questions about the results on MMVet #10

Closed Ivesfu closed 2 months ago

Ivesfu commented 2 months ago

Hi!

I found the 7B ckpt file you provided in a previous issue. After running my tests, I obtained the following results. Could you help me figure out why there is such a discrepancy between these results and the results reported in the paper? I used the testing code provided by the official llava scripts.

Thank you for your assistance!

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

rec | ocr | know | gen | spat | math | total | std | runs |   -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 37.3 | 26.9 | 20 | 23.7 | 29.6 | 9.2 | 33.3 | 0 | [np.float64(33.3)]

pipilurj commented 2 months ago

Hi, thanks a lot for your interest. Please try to reinstall the packages provided in the current repo. The score achieved by huggingface checkpoint should be around 36.0, the slight different should be due to different batchsizes and number of GPUs I used. However, 33.3 may indicate either something is wrong with the inference code, or the package versions are incorrect.

Ivesfu commented 2 months ago

Ok, I will try it again. Thanks for your reply. I will update my situation ASAP.