Closed rohit-gupta closed 1 year ago
Hi @rohit-gupta , Thanks for your interest in our work. We have trained on the medium scale of DataComp (due to computational constraints). If it helps, we can release the ids of the filtered subset as generally required in DataComp.
Thanks Sachin
That would be great, thanks !
Hi,
Thanks very much for releasing this useful method and the code to implement it. It seems likely that your filtered dataset might become the new SotA dataset for training Vision-Language models.
Are there any plans to release the filtered dataset in the form of a list of URLs or something similar ?