open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

[Model] RBDash Added #429

Closed anzhao920 closed 2 months ago

anzhao920 commented 2 months ago

Adding RBDash model

Details :

RBDash GitHub repository: https://github.com/RBDash-Team/RBDash RBDash model: https://huggingface.co/RBDash-Team/RBDash-v1.2-72b

FangXinyu-0913 commented 2 months ago

@anzhao920 Thank you for your contribution, I was wondering if you forgot to use the variable options in the build_mme and build_hallusionbench functions under the rbdash.py file and if it needs to be added to the prompt. If not, I will consider removing this variable.

anzhao920 commented 2 months ago

@anzhao920 Thank you for your contribution, I was wondering if you forgot to use the variable options in the build_mme and build_hallusionbench functions under the rbdash.py file and if it needs to be added to the prompt. If not, I will consider removing this variable.

@FangXinyu-0913 Thank you so much for your reply. I've removed these unused variables. PTAL.

FangXinyu-0913 commented 2 months ago

https://huggingface.co/RBDash-Team/RBDash-v1.2-72b This page is 404, where can the model weight be downloaded?

anzhao920 commented 2 months ago

https://huggingface.co/RBDash-Team/RBDash-v1.2-72b This page is 404, where can the model weight be downloaded?

@FangXinyu-0913 My apologies for not making the Hugging Face model public earlier. It is now available for download. Please try the link again: https://huggingface.co/RBDash-Team/RBDash-v1.2-72b.

FangXinyu-0913 commented 2 months ago

image I encountered this problem when loading, but the repository you provided(https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup) does not contain the file preprocessor_config.json, and I would like to ask which repository should I go to download it.

anzhao920 commented 2 months ago

image I encountered this problem when loading, but the repository you provided(https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup) does not contain the file preprocessor_config.json, and I would like to ask which repository should I go to download it.

@FangXinyu-0913 The preprocessing configuration for the model (OpenCLIP-ConvNeXt-L) we are using is consistent with clip-vit-large-patch14-336. Please use the configuration available at: https://huggingface.co/openai/clip-vit-large-patch14-336/blob/main/preprocessor_config.json.

Apologies for the previous issues in our official RBDash README. The vision encoder used for this model is actually InternViT-6B-448px-V1-5 rather than clip-vit-large-patch14-336. The README has now been updated accordingly, and you can find the details here: https://github.com/RBDash-Team/RBDash?tab=readme-ov-file#pretrained-weights.

FangXinyu-0913 commented 2 months ago

image @anzhao920 Thank you for your answer. I'm experiencing this problem when loading models. It looks like it's due to the lack of memory that makes it difficult to load the model onto the GPU fully, and I wanted to ask if there is a way to spread the model weights evenly across the GPUs (like the automodel.from_pretrained function that sets the device_map parameter) so that I can make sure that the model is loading correctly.

anzhao920 commented 2 months ago

image @anzhao920 Thank you for your answer. I'm experiencing this problem when loading models. It looks like it's due to the lack of memory that makes it difficult to load the model onto the GPU fully, and I wanted to ask if there is a way to spread the model weights evenly across the GPUs (like the automodel.from_pretrained function that sets the device_map parameter) so that I can make sure that the model is loading correctly.

@FangXinyu-0913 Setting 'device_map' to 'auto' when loading the model should be sufficient.

FangXinyu-0913 commented 2 months ago

I added device_map='auto', but still encountered OOM, so I also add device='cpu'. This allowed the model to load successfully, but it was slow, and ran into this problem. What do you think could be done to fix it? image

anzhao920 commented 2 months ago

I added device_map='auto', but still encountered OOM, so I also add device='cpu'. This allowed the model to load successfully, but it was slow, and ran into this problem. What do you think could be done to fix it? image

@FangXinyu-0913 Could you please provide the type of GPU you use for testing and the number of GPUs you test with? We plan to replicate the setup locally and will submit it once we've confirmed it works.

FangXinyu-0913 commented 2 months ago

I added device_map='auto', but still encountered OOM, so I also add device='cpu'. This allowed the model to load successfully, but it was slow, and ran into this problem. What do you think could be done to fix it? image

@FangXinyu-0913 Could you please provide the type of GPU you use for testing and the number of GPUs you test with? We plan to replicate the setup locally and will submit it once we've confirmed it works.

NVIDIA A800-SXM4-80GB * 2. Thanks for your help!

anzhao920 commented 2 months ago

I added device_map='auto', but still encountered OOM, so I also add device='cpu'. This allowed the model to load successfully, but it was slow, and ran into this problem. What do you think could be done to fix it? image

@FangXinyu-0913 Could you please provide the type of GPU you use for testing and the number of GPUs you test with? We plan to replicate the setup locally and will submit it once we've confirmed it works.

NVIDIA A800-SXM4-80GB * 2. Thanks for your help!

@FangXinyu-0913 No problem! We've replicated the setup locally and found that at least three 80G A800 GPUs are necessary. The 72B language model weights occupy around 150G, InternViT-6B takes up approximately 15G, and CLIP requires about 2G. Including the activations during inference, the total slightly exceeds the capacity of two 80G A800 GPUs.

FangXinyu-0913 commented 2 months ago

Thank you for your reply! I have successfully loaded the model on 4 A800 and will be merging this PR within the day. If you have any comments for further changes, feel free to submit a commit or comment.

FangXinyu-0913 commented 2 months ago

eval result on MMBench_EN: image

anzhao920 commented 2 months ago

eval result on MMBench_EN: image

@FangXinyu-0913 Hi Xinyu,

I have a question regarding the performance of our model on the OpenCompass multi-modal leaderboard. According to the leaderboard, our model’s score on MMBench is 80.2. However, the figure you provided shows a score of 83.1.

Could you please help clarify the reason for this discrepancy? Thank you!

leaderboard

kennymckormick commented 2 months ago

eval result on MMBench_EN: image

@FangXinyu-0913 Hi Xinyu,

I have a question regarding the performance of our model on the OpenCompass multi-modal leaderboard. According to the leaderboard, our model’s score on MMBench is 80.2. However, the figure you provided shows a score of 83.1.

Could you please help clarify the reason for this discrepancy? Thank you!

leaderboard

Hi, @anzhao920 , On the leaderboard we present the average score of MMBench_V1.1_EN_TEST and MMBench_V1.1_CN_TEST, while the screen shot @FangXinyu-0913 provided is for MMBench_V1.0_EN_TEST.

anzhao920 commented 2 months ago

eval result on MMBench_EN: image

@FangXinyu-0913 Hi Xinyu, I have a question regarding the performance of our model on the OpenCompass multi-modal leaderboard. According to the leaderboard, our model’s score on MMBench is 80.2. However, the figure you provided shows a score of 83.1. Could you please help clarify the reason for this discrepancy? Thank you! leaderboard

Hi, @anzhao920 , On the leaderboard we present the average score of MMBench_V1.1_EN_TEST and MMBench_V1.1_CN_TEST, while the screen shot @FangXinyu-0913 provided is for MMBench_V1.0_EN_TEST.

Hi, @kennymckormick Thank you very much for the clarification!