issues
search
rom1504
/
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
MIT License
3.71k
stars
338
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Are there plans to support WebP?
#287
CS123n
closed
1 year ago
1
Official .pex File Does not Support output_format="tfrecord"
#286
zw615
closed
1 year ago
4
Can I try downloading LAION400M with multiple PC?
#285
sunggukcha
opened
1 year ago
1
Respect robots.txt
#284
slavakurilyak
closed
1 year ago
1
Add option to allow newlines in captions
#283
achalddave
opened
1 year ago
5
When use parquet as output format decode the bytes in jpg, the image result color seems wrong.
#282
svjack
closed
1 year ago
1
Multiple Alerts of Malicious URLs
#281
vatsalmoradiya
closed
1 year ago
6
ERROR: Could not build wheels for pyarrow, scikit-image, which is required to install pyproject.toml-based projects
#280
precurcor
closed
1 year ago
1
Lower Success Rate when output_format=files
#279
zw615
opened
1 year ago
0
Downloading cc3m with some wrong
#278
knight4u13
opened
1 year ago
1
Failed to download all of LAION-400M
#277
zw615
closed
10 months ago
10
Bbox crop implementation
#276
vanga
opened
1 year ago
3
Support for bounding box cropping
#275
vanga
opened
1 year ago
4
white borders when downloading image
#274
youngfly11
opened
1 year ago
1
add databricks notebook
#273
smellslikeml
closed
1 year ago
0
Adding ray as a distributor
#272
Vaishaal
closed
1 year ago
10
CItation
#271
dpaleka
closed
1 year ago
1
Citing this repo
#270
dpaleka
closed
1 year ago
2
[wip] Duration + head
#269
rom1504
closed
1 year ago
1
consider reverting breaking change md5 -> sha256
#268
rom1504
opened
1 year ago
0
Add ocifs for object-storage users.
#267
kuno989
closed
1 year ago
2
consider adding option to use only head and compute stats instead of actually downloading
#266
rom1504
opened
1 year ago
2
use a pipeline concept to refactor downloader.py
#265
rom1504
opened
1 year ago
0
pycurl
#263
rom1504
closed
1 year ago
1
wip good bad pool
#262
rom1504
closed
1 year ago
3
Figure out how to timeout
#261
rom1504
opened
1 year ago
19
Add asyncio implementation of downloader
#260
KohakuBlueleaf
closed
1 year ago
4
opencv-python => opencv-python-headless
#259
shionhonda
closed
1 year ago
2
Verify hashes during download.
#258
GeorgiosSmyrnis
closed
1 year ago
6
Release 1.40.0
#257
gabrielilharco
closed
1 year ago
0
investigate async
#256
rom1504
opened
1 year ago
2
Add support for extra hashes
#255
GeorgiosSmyrnis
closed
1 year ago
0
Bump ffspec version to 2022.11
#254
gabrielilharco
closed
1 year ago
0
Proper blurring when padding/cropping.
#253
GeorgiosSmyrnis
closed
1 year ago
3
Very low speed on subcaptions and mscoco dataset
#252
KohakuBlueleaf
opened
1 year ago
13
Release 1.38.0
#251
rom1504
closed
1 year ago
0
Add information in the readme to promote the rights of AI artists and AI trainers
#250
rom1504
closed
1 year ago
4
Respect x robots tag by default
#249
Stealcase
closed
1 year ago
19
Respect x-robots-tag directives by default
#248
Stealcase
closed
1 year ago
1
Does an 'noai' opt-out directive even work?
#247
milotheirself
closed
1 year ago
5
Release 1.37.0
#246
rom1504
closed
1 year ago
0
Respect x-robots-directives by default
#245
Stealcase
closed
1 year ago
5
Duplicate images in ms coco
#244
tungdop2
opened
1 year ago
3
Consider to not recommend skip_reencode
#243
rom1504
opened
1 year ago
3
Failed to download all of CC12M
#242
AlaaKhaddaj
closed
1 year ago
4
Incorporate face blurring with bounding boxes.
#241
GeorgiosSmyrnis
closed
1 year ago
16
Add support for resizing with fixed aspect ratio while fixing the largest image dimension
#240
gabrielilharco
closed
1 year ago
3
update fsspec
#239
rom1504
closed
10 months ago
6
Move resizer arg check in resizer.
#238
rom1504
closed
1 year ago
0
Better logger
#237
rom1504
opened
1 year ago
1
Previous
Next