issues
search
rom1504
/
laion-prepro
Get hundred of million of image+url from the crawling at home dataset and preprocess them
207
stars
20
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Link to download annotated data
#22
shubhamagarwal92
opened
1 year ago
0
Update preparing_data_for_training.md
#21
vanpersie32
closed
1 year ago
0
Define process and load_clip in data loader
#20
EmilyWebber
opened
2 years ago
1
laion400m/download_csv is not available.
#19
yj-yu
opened
2 years ago
1
add description of the schema of all laion datasets
#18
rom1504
opened
2 years ago
0
Does https://github.com/rom1504/laion-prepro/blob/main/laion5B/safety/join.py work for non-en langs?
#17
PranshuBansalDev
closed
2 years ago
13
laion5B scripts
#16
rom1504
closed
2 years ago
0
How to download the newest version of dataset without duplicate files?
#15
qiaogh97
closed
3 years ago
2
md5 check for `.parquet` files
#14
vtddggg
opened
3 years ago
4
How many about the dataset?
#13
qiaogh97
closed
3 years ago
3
How to set '--url_list' parameter in download_images.sh?
#12
qiaogh97
closed
3 years ago
2
add command line calls for clip retrieval for cah
#11
rom1504
closed
3 years ago
1
Add more information to make this a good home page for the dataset
#10
rom1504
closed
3 years ago
2
Consider ways to distribute the dataset
#9
rom1504
opened
3 years ago
3
make listing and download of csv files faster and cleaner
#8
rom1504
opened
3 years ago
0
make this more user friendly
#7
rom1504
opened
3 years ago
0
add how to get interesting subsets
#6
rom1504
opened
3 years ago
0
add what to train using that (clip, dalle)
#5
rom1504
opened
3 years ago
0
add how to make knn indices
#4
rom1504
closed
3 years ago
1
add how to make clip embeddings out of it
#3
rom1504
closed
3 years ago
1
consider shuffling the dataset
#2
rom1504
closed
3 years ago
0
add content
#1
rom1504
closed
3 years ago
1