rom1504 / laion-prepro

Get hundred of million of image+url from the crawling at home dataset and preprocess them
207 stars 20 forks source link

Consider ways to distribute the dataset #9

Open rom1504 opened 3 years ago

rom1504 commented 3 years ago

options:

Those should make it easy to distribute the dataset of url+captions

Worth investigating about the images

rom1504 commented 3 years ago

https://www.kaggle.com/romainbeaumont/laion400m but let's also do it for other solutions

rom1504 commented 3 years ago

https://paperswithcode.com/contribute/dataset/new

rom1504 commented 2 years ago

update https://paperswithcode.com/paper/laion-400m-open-dataset-of-clip-filtered-400

same for laion5B

put on kaggle as well like https://www.kaggle.com/romainbeaumont/laion400m