issues
search
mlfoundations
/
datacomp
DataComp: In search of the next generation of multimodal datasets
http://datacomp.ai/
Other
587
stars
49
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Invalid files for Datacomp1B
#84
borisdayma
opened
1 month ago
0
ImageNet 21k based filtered dataset
#83
isidentical
opened
1 month ago
1
Downloading Commonpool XLarge
#82
zanussbaum
opened
2 months ago
0
Average caption length for CommonPool
#81
BIGBALLON
closed
2 months ago
1
Availability of npy indices for large pool
#80
88cf6a5ff
opened
3 months ago
0
ModuleNotFoundError: No module named 'training'
#79
adymaharana
closed
3 months ago
2
About update metadata with the corresponding image sample in shards
#77
ypwang61
opened
5 months ago
2
Frequency of Leaderboard Updates
#76
brunnedu
closed
5 months ago
1
Training log
#75
mactavish91
closed
5 months ago
1
metadata readme
#74
sagadre
opened
6 months ago
0
Pretraining dataset
#73
mactavish91
opened
6 months ago
1
Metadata for datacomp-large text-based filter
#72
aknvictor
closed
5 months ago
1
Remove CSAM, if present
#71
ahundt
opened
6 months ago
2
Deduplication against evaluation sets
#70
nopperl
closed
7 months ago
1
`zeroshot_templates` split error for FairFace / UTKFace
#69
EIFY
closed
4 months ago
9
the normal success rate and downloading speed?
#68
Tycho-Xue
opened
8 months ago
1
14% of SHA256 hashes not matching
#67
pfischer-nvidia
opened
8 months ago
32
Conda environment build issue
#66
brunnedu
closed
9 months ago
3
Dataset Size on Leaderboard
#62
brunnedu
closed
9 months ago
1
FMoW dataset and results variance
#61
teasgen
closed
9 months ago
1
Add AWS S3 dependencies to environment.yml
#60
0x2b3bfa0
closed
5 months ago
2
Usage with AWS S3 and Ray
#59
0x2b3bfa0
opened
9 months ago
5
Expose img2dataset distributor
#58
0x2b3bfa0
opened
9 months ago
3
Tried evaluate the model on a local network only machine
#57
zwsjink
opened
9 months ago
4
Not able to push data to google cloud storage
#56
krmayankb
opened
9 months ago
1
Will the training framework do upsampling when train-num-samples is far more than the amount of actual data
#55
zwsjink
closed
9 months ago
8
Appendix in the workshop paper submission
#54
zihengh1
closed
9 months ago
2
Workshop submission deadline
#53
mamdouhJ
closed
9 months ago
3
How to precompute and save model-based metric during download?
#52
bram-w
closed
9 months ago
2
Get 400 error when submitting jsonl to firebase, but successfully submit Slack notification
#51
C-1995-O1-HALE-POPP
closed
10 months ago
2
In image-based filtering, only imagenet L14 is supported, making extracting B32 embeddings useless
#50
zwsjink
closed
9 months ago
6
https://huggingface.co/datasets/djghosh/wds_flickr_1k_test_image_text_retrieval_test doesn't exist?
#49
mingdachen
closed
10 months ago
3
How to achieve exact same # of samples seen?
#48
zwsjink
closed
10 months ago
12
default to amp_bfloat16 for xlarge
#47
sagadre
closed
10 months ago
0
Text search over CommonPool
#46
sedol1339
closed
9 months ago
1
Connection error while half-downloading metadata
#45
sedol1339
opened
10 months ago
3
[download_upstream] Add flags
#44
NielsRogge
closed
10 months ago
1
[Challenge rules] Is it allowed to modify the texts of the data in the filtering track?
#43
mingtan2
closed
10 months ago
4
Result submission deadline
#42
bluer555
closed
11 months ago
1
Downloading DataComp-1B
#41
linzhiqiu
opened
11 months ago
1
Workshop paper submission
#40
mingtan2
closed
11 months ago
1
download data, why success is always 0.14, but the file size is 80% of the total file (total file size is in README)
#39
KylinC
opened
11 months ago
7
download data
#38
KylinC
closed
11 months ago
1
train/test splits for downstream tasks
#37
bluer555
closed
11 months ago
1
Update environment_osx.yml
#36
zwcolin
closed
11 months ago
0
Style fix
#35
Vaishaal
closed
11 months ago
0
--output_dir does not do correct thing if --output_dir is a cloud path
#34
Vaishaal
opened
11 months ago
1
Metadata download error - OSError: Consistency check failed
#33
ch-shin
closed
10 months ago
7
Missing training file?
#32
meghbhalerao
closed
11 months ago
1
Deduplication of eval datasets
#30
borisdayma
opened
11 months ago
1
Next