locuslab / T-MARS

Code for T-MARS data filtering
https://tmars-clip.github.io
MIT License
35 stars 5 forks source link

Any plans to release the filtered dataset ? #2

Closed rohit-gupta closed 1 year ago

rohit-gupta commented 1 year ago

Hi,

Thanks very much for releasing this useful method and the code to implement it. It seems likely that your filtered dataset might become the new SotA dataset for training Vision-Language models.

Are there any plans to release the filtered dataset in the form of a list of URLs or something similar ?

SachinG007 commented 1 year ago

Hi @rohit-gupta , Thanks for your interest in our work. We have trained on the medium scale of DataComp (due to computational constraints). If it helps, we can release the ids of the filtered subset as generally required in DataComp.

Thanks Sachin

rohit-gupta commented 1 year ago

That would be great, thanks !