openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

[Morph/production] Excon::Error::Socket: no block given (yield) (LocalJumpError) #1121

Open henare opened 7 years ago

henare commented 7 years ago

Backtrace

line 52 of [PROJECT_ROOT]/lib/morph/docker_utils.rb: block in get_or_pull_image
line 50 of [PROJECT_ROOT]/lib/morph/docker_utils.rb: rescue in get_or_pull_image
line 48 of [PROJECT_ROOT]/lib/morph/docker_utils.rb: get_or_pull_image

View full backtrace and more info at honeybadger.io

henare commented 7 years ago

This has only started to happen recently. @auxesis do you have any clues about what might have changed that would cause this?

auxesis commented 7 years ago

Things I can think that are different now:

Whatever's happening, I'm pretty sure this is a legitimate bug in the code.

henare commented 7 years ago

I bet it's a race condition. This code was only ever intended to run once when you first set up morph and forget to run update_docker_image.

However the actual problem here is that...

The Docker image cache has been wiped

...should not wipe this image. If this is the case it means the image is being re-downloaded every hour. Not only does this take unnecessary bandwidth but blowing this much of the cache away will make scraper runs very slow.

If you don't have a cached container to use it could mean you have to wait for this container to download, your scraper language to be downloaded and compiled, and your libraries downloaded and installed. This could easily take several minutes. If this reset is happening every hour it would be very frustrating for people running scrapers.

@auxesis have I understood this correctly? Do we think it's having this impact?