Great job! I found there are two versions pretrain datasets: blip_laion_cc_sbu_558k and LLaVA-CC3M-Pretrain-595K. I'd like to know what are the differences between them and which one is better. Did you analyze the quality of these datasets and why they have different performance? Hope for your reply. Thanks a lot! @haotian-liu
Question
Great job! I found there are two versions pretrain datasets: blip_laion_cc_sbu_558k and LLaVA-CC3M-Pretrain-595K. I'd like to know what are the differences between them and which one is better. Did you analyze the quality of these datasets and why they have different performance? Hope for your reply. Thanks a lot! @haotian-liu