Open pabl0 opened 9 months ago
Try it out, if benchmark results look good it could be a good option.
On Thu, Dec 14, 2023, 21:45 Henrik Ahlgren @.***> wrote:
This is an attempt to fix #332 https://github.com/rom1504/img2dataset/issues/332 in a simple manner (not using anything fancy like urllib3.Retry). I think it should improve d/l performance significantly on datasets with large amounts of 404 images, but I have not done a lot of benchmarking.
I haven't found any best practices (like RFCs) wrt what HTTP codes to retry, but the following should be a reasonable list:
- 408 Request Timeout
- 429 Too Many Requests (respect the Retry-After header if it's in seconds and less than 10)
- 500 Internal Server Error
- 502 Bad Gateway
- 503 Service Unavailable
- 504 Gateway Timeout
You can view, comment on, or merge this pull request online at:
https://github.com/rom1504/img2dataset/pull/368 Commit Summary
- 9387c7a https://github.com/rom1504/img2dataset/pull/368/commits/9387c7a0e9517d26b56b375892b63326d8d358f4 Retry only on certain HTTP codes
File Changes
(1 file https://github.com/rom1504/img2dataset/pull/368/files)
- M img2dataset/downloader.py https://github.com/rom1504/img2dataset/pull/368/files#diff-2ded925514a1a2d2eebace43140502f4b37b16c8bd36a5a801360def648088b1 (12)
Patch Links:
- https://github.com/rom1504/img2dataset/pull/368.patch
- https://github.com/rom1504/img2dataset/pull/368.diff
— Reply to this email directly, view it on GitHub https://github.com/rom1504/img2dataset/pull/368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAR437VIQ5II2JNCMDOYHN3YJNQPJAVCNFSM6AAAAABAVOJP5SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA2DENBVHEZTQOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is an attempt to fix #332 in a simple manner (not using anything fancy like urllib3.Retry). I think it should improve d/l performance significantly on datasets with large amounts of 404 images, but I have not done a lot of benchmarking.
I haven't found any best practices (like RFCs) wrt what HTTP codes to retry, but the following should be a reasonable list: