Scrapers with no console output can take an unnecessarily long time to finish running

The Docker client has a read timeout for console output. We set this to be 5 minutes. This means if a scraper doesn't output anything for 5 minutes the background worker throws an exception: Docker::Error::TimeoutError: read timeout reached.

Normally this isn't a big deal. Sidekiq will just retry and it will finish up as usual in one of those retries. However it can be a problem if you have a long running scraper because Sidekiq will back off its retries, i.e. the scraper Docker run has finished and its container is stopped but the background job is backed off and won't retry for another few hours.

This has the effect of the job taking much longer than it needs to finish and also of unnecessarily taking a queue slot while it waits to retry and finish up.

openaustralia / morph

Scrapers with no console output can take an unnecessarily long time to finish running #1123