ys-zong / VL-ICL

Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning
https://ys-zong.github.io/VL-ICL/
29 stars 2 forks source link

llava requirements #4

Open icrto opened 7 months ago

icrto commented 7 months ago

Could you please provide the requirements.txt for llava?

Thanks!

ys-zong commented 7 months ago

You can install Llava from their original repo: https://github.com/haotian-liu/LLaVA Does that work for you?

icrto commented 7 months ago

Yes, that works for me. However, I have not been able to reproduce the results in the paper, and as you mentioned here you were using another version of llava, I thought it might have to do with that, hence my request for the specific package versions you used.

As an example, on Fast Open-Ended MiniImageNet with LLaVA-Next-7B with 2 shots and a detailed description you report (in table 47) an accuracy of 33.67 ± 2.25 while I obtain 14.0. On Operator Induction with:

(This is after I remove the truncation as mentioned in the link.)

ys-zong commented 6 months ago

Sorry for the late reply. I just re-run Llava from their latest code and I can reproduce the reported accuracies with marginal difference. I don't have a very clear idea of why there is a huge differences. I'll aim to refactor Llava-next to the Huggingface implementation soon for a more stable reproduction.