joedockrill / jmd_imagescraper

Image scraping library for creating deep learning datasets
https://joedockrill.github.io/jmd_imagescraper/
Apache License 2.0
33 stars 15 forks source link

Added timeout options #11

Open ancientjpeg opened 2 years ago

ancientjpeg commented 2 years ago

Allows for a timeout option in all functions that make HTTPS requests, as the entire program can get halted indefinitely when a single image server isn't responding. Also adds a second parameter, scrape_timeout, to the duckduckgo_search function, which allows the user to differentiate between the timeouts for URL scraping vs downloading, as duckduckgo always responds quickly while many image servers do not. New options are typed to be consistent with the implementation of the timeout option from the requests library, i.e. they are annotated as being either a float or a tuple.

ancientjpeg commented 2 years ago

I should also add that I saw several unexpected changes in core.html when I built the docs (make docs), so I kept the old core.html file and only changed the \

tags that seemed to hold the code itself, as I was concerned that some of the current html has been edited after the build process.