issues
search
rom1504
/
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.71k
stars
338
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Refactor as a (self hosted) service
#339
rom1504
opened
1 year ago
2
High Initial RAM Usage Leads to Crashes
#338
Sypherd
opened
1 year ago
2
Support all compression types via fsspec
#337
Skylion007
opened
1 year ago
1
Package knot resolver as a python package and use it
#336
rom1504
opened
1 year ago
1
minor fix laion-high-resolution.md
#335
ShoufaChen
closed
1 year ago
0
Cloudflare R2 Compatibility
#334
zanussbaum
closed
10 months ago
6
is there a way to retry failed urls after the job has completed?
#333
yxchng
opened
1 year ago
8
Do not retry 404 links
#332
Skylion007
opened
1 year ago
1
Implement Exponential Backoff
#331
Skylion007
opened
1 year ago
7
distributed downloader freeze under 'cluster mode'
#330
zwsjink
opened
1 year ago
0
Any examples on how to pass in a url_list stored on OSS (s3 like)
#329
zwsjink
opened
1 year ago
3
Adding DataComp-1B info
#328
gabrielilharco
closed
1 year ago
0
many write error while using oss (s3-like) remote bucket storage
#327
ldfandian
opened
1 year ago
3
support more intput formats (txt.gz, csv.gz, json.gz, jsonl, jsonl.gz) and add test cases for it
#326
ldfandian
closed
1 year ago
1
support more intput formats (txt.gz, csv.gz, json.gz, jsonl, jsonl.gz) and add test cases for it
#325
ldfandian
closed
1 year ago
0
does img2dataset support jsonl as input format?
#324
ldfandian
closed
1 year ago
5
Add instructions to get datacomp1B
#323
rom1504
opened
1 year ago
0
The link to commonpool.md is dead
#322
wwfcnu
closed
1 year ago
1
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 17 columns, got 1
#321
jelech
opened
1 year ago
0
Remove tmp_dir only if the output dir is not in s3
#320
erezzarum
closed
1 year ago
3
Temp dir removal - FileNotFoundError: ['mybucket/data/tests/test_1000_parquet/5/_tmp']
#319
erezzarum
closed
1 year ago
2
error: pyarrow.lib.ArrowInvalid: CSV parse error: Expected 1 columns, g
#318
tom666tom666
opened
1 year ago
6
pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(URL) in NSFW: string
#317
qnyan
closed
1 year ago
1
use a proxy when downloading images
#316
jelech
opened
1 year ago
1
Support mosaic streaming
#315
rom1504
opened
1 year ago
0
Consider switching to fiddle config
#314
rom1504
opened
1 year ago
0
[laion high resolution] How to only extract a certain number of images?
#313
jS5t3r
closed
1 year ago
4
Replace specific opt-out support with datadiligence package for more general opt-out support
#312
Padge91
opened
1 year ago
0
MacOS hidden files cause logger process exit
#311
FlyHighest
opened
1 year ago
1
some meta data are missing
#310
zhangvia
closed
1 year ago
1
Add code highlighting to the README
#309
bryant1410
closed
1 year ago
2
Implement the W3C TDM Reservation Protocol and enable a more standard opt-out mechanism
#308
llemeurfr
opened
1 year ago
7
Switch to requests to check headers before streaming content
#307
raincoastchris
opened
1 year ago
1
Adding CommonPool instructions
#306
gabrielilharco
closed
1 year ago
0
Documentation enhancement on robots.txt and scraping (PR issue)
#304
maathieu
opened
1 year ago
4
Noncompliance by default with the General Data Protection Regulation (GDPR/RGPD)
#303
jmaris
closed
1 year ago
11
Read and cache robots.txt files for each host using thread-local storage
#302
ephphatha
opened
1 year ago
6
Include none directive if either of noindex,nofollow are specified
#301
ephphatha
opened
1 year ago
0
Set a user agent string that matches convention used by libraries/tools
#300
ephphatha
opened
1 year ago
0
Implement HEAD followed by GET to reduce traffic when headers are present
#299
rom1504
opened
1 year ago
1
img2dataset ignores X-Robots-Tag
#298
Catbuttes
closed
1 year ago
8
Correct webp lossless statement
#297
jonasricker
closed
1 year ago
1
Test pex in ci
#295
rom1504
closed
1 year ago
0
Support recording image license
#294
tomchiverton
closed
1 year ago
7
Please make this tool "opt-in" by default
#293
edent
closed
1 year ago
29
Unable to use img2dataset to download laion-high-resolution without install chardet
#292
Shamik-07
opened
1 year ago
0
Unable to use img2dataset to download laion-high-resolution without install chardet
#291
Shamik-07
closed
1 year ago
5
Widen pyarrow dependency range
#290
malcolmgreaves
closed
1 year ago
1
Process hanging forever before the end
#289
HugoLaurencon
opened
1 year ago
13
Fix README regarding lossless webp
#288
jonasricker
closed
1 year ago
1
Previous
Next