open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 40+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.24k stars 177 forks source link

Are images compressed in tsv files? #517

Open ywwynm opened 2 weeks ago

ywwynm commented 2 weeks ago

Thanks for your contribution to providing a collection of VLM datasets and models. I'm wondering why the tsv versions of datasets in this repository are smaller than the official versions. For example, the RealworldQA dataset downloaded from the official website RealWorldQA has 677MB, while the tsv version in this repo RealWorldQA_tsv only has 175MB. You are using base64 to encode images into texts and store them directly in tsv columns, which should be lossless. So why has the data size been reduced significantly? It seems that other datasets are having the similar situation.

SYuan03 commented 2 weeks ago

Hello, @ywwynm The images in the dataset on the official RealWorldQA website are in webp format, whereas when we converted the original dataset to tsv format, we uniformly converted it to JPEG format during encode, you can refer to the code here in our repo.

ywwynm commented 2 weeks ago

@SYuan03 Thanks for your explanation. For other datasets like SeedBench or MMTBench, is such processing also performed? If the original images have already been in JPEG format, will you compress it again using the same code?

SYuan03 commented 2 weeks ago

Hello, @ywwynm In fact, if the original image is in jpeg format, there will not be such a significant change in data size even after our processing. We just convert it to the tsv format we need for the convenience of unified processing.