use the mimetype returned by the server, as we cannot deduce it from the URI alone, this avoids mimetype confusion in the resulting epub/zipfile
add logging to better understand what's happening
We can be aggressive in the connect timeout, as the user will have just (successfully?) opened whatever page they want converted, so it can't be horribly broken and giving up sooner on stale images is better. The browser doesn't have this issue, as it will lazily/asynchronously load in the images (or not), but for the epub we need to wait for all downloads to finish.
This helps with blogs that have affiliate Amazon links with tracking pixles pointing to https://ir-na.amazon-adsystem.com which no longer responds and thus piles on timeout after timeout.
Future changes should use a Threadpool to have more of these downloads in parallel.
We can be aggressive in the connect timeout, as the user will have just (successfully?) opened whatever page they want converted, so it can't be horribly broken and giving up sooner on stale images is better. The browser doesn't have this issue, as it will lazily/asynchronously load in the images (or not), but for the epub we need to wait for all downloads to finish.
This helps with blogs that have affiliate Amazon links with tracking pixles pointing to https://ir-na.amazon-adsystem.com which no longer responds and thus piles on timeout after timeout.
Future changes should use a Threadpool to have more of these downloads in parallel.
This helps with issues #344 and #316