openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Scraper failing with status code 255 #1086

Open henare opened 8 years ago

henare commented 8 years ago

@tmtmtmtm reported that:

https://morph.io/tmtmtmtm/argentina-diputados-wikidata keeps failing "with status code 255". It's not obvious at all what's going on here, but whatever it is seems to leave the database in a broken status.

henare commented 8 years ago

I've forked it and get the same result. As far as I know status code 255 only happens on a failed compile or when you try to stop a scraper and the container has disappeared so we definitely shouldn't be seeing this.

henare commented 8 years ago

In a local morph instance, this runs successfully but appears to run 4 times, which seems odd?

Injecting configuration and compiling...
       -----> Ruby app detected
-----> Compiling Ruby
-----> Using Ruby version: ruby-2.0.0
-----> Installing dependencies using bundler 1.11.2
       Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment
       Fetching gem metadata from https://rubygems.org/.........
       Fetching version metadata from https://rubygems.org/...
       Fetching dependency metadata from https://rubygems.org/..
       Fetching https://github.com/openc/sqlite_magic.git
       Fetching https://github.com/openaustralia/scraperwiki-ruby.git
       Fetching https://github.com/everypolitician/wikidata-fetcher.git
       Fetching https://github.com/everypolitician/wikisnakker
       Rubygems 2.0.14.1 is not threadsafe, so your gems will be installed one at a time. Upgrade to Rubygems 2.1.0 or higher to enable parallel gem installation.
       Installing coderay 1.1.0
       Installing colorize 0.7.7
       Installing diskcached 1.1.2
       Installing unf_ext 0.0.7.2 with native extensions
       Installing excon 0.49.0
       Installing multipart-post 2.0.0
       Installing hashie 3.4.4
       Installing httpclient 2.7.1
       Installing json 1.8.3 with native extensions
       Installing method_source 0.8.2
       Installing mime-types 2.99.1
       Installing mini_portile2 2.0.0
       Installing netrc 0.11.0
       Installing open-uri-cached 0.0.5
       Installing slop 3.6.0
       Installing sqlite3 1.3.11 with native extensions
       Using wikisnakker 0.0.1 from https://github.com/everypolitician/wikisnakker (at master@a9d6711)
       Using bundler 1.11.2
       Installing unf 0.1.4
       Installing faraday 0.9.2
       Installing nokogiri 1.6.7.2 with native extensions
       Installing pry 0.10.3
       Using sqlite_magic 0.0.6 from https://github.com/openc/sqlite_magic.git (at master@4df975e)
       Installing domain_name 0.5.20160310
       Installing faraday_middleware 0.10.0
       Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)
       Installing http-cookie 1.0.2
       Installing wikidata-client 0.0.10
       Installing faraday-cookie_jar 0.0.6
       Installing rest-client 1.8.0
       Installing mediawiki_api 0.5.0
       Using wikidata-fetcher 0.15.0 from https://github.com/everypolitician/wikidata-fetcher.git (at master@601deba)
       Bundle complete! 10 Gemfile dependencies, 32 gems now installed.
       Gems in the groups development and test were not installed.
       Bundled gems are installed into ./vendor/bundle.
       Bundle completed (125.53s)
       Cleaning up the bundler cache.
       
       -----> Discovering process types
       Procfile declares types -> scraper
Injecting scraper and running...
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
    = 237 recent
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
    = 237 recent
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
    = 237 recent
DONE!
⁇ Unknown claim: https://www.wikidata.org/wiki/Property:P2218 for Q15719117
henare commented 8 years ago

When I first ran the forked scraper it got 100 items. Running it a second time resulted in a empty database.

selection_002

henare commented 8 years ago

Running locally for a second time it seems to loop but differently (!?!):

Injecting configuration and compiling...
Injecting scraper and running...
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
    = 237 recent
2013 election
 = 64 people
Wikidata lookup
  Fetching 800 items
    = 237 recent
⁇ Unknown claim: https://www.wikidata.org/wiki/Property:P2218 for Q15719117
DONE!

It's still completing successfully. We also got a copy of the problematic production database and ran the scraper over that locally and it worked OK, saving 237 records into the database OK.

henare commented 8 years ago

@tmtmtmtm this is really weird. As you can see above, when we run this in a local instance of morph the scraper appears to loop. This seems to be a problem with the scraper, not morph. Are you able to get to the bottom of that?

Admittedly this doesn't resolve the real problems we're seeing with this scraper on production morph but we want to understand that first before moving onto more troubleshooting.

LoveMyData commented 8 years ago

Another example if that helps. https://morph.io/planningalerts-scrapers/melbourne

henare commented 8 years ago

@LoveMyData that's a big help, thanks!

It seems like that scraper stopped running successfully on the 19th of July and hasn't finished successfully since. Interestingly there's been runs of @tmtmtmtm's that have been successful since that date.

Running https://morph.io/planningalerts-scrapers/melbourne on a local instance of morph works fine.

Another thing worth noting is that the PlanningAlerts scraper stops logging output some of the way through. In the most recent run it only logs output for 25 development application detail pages, whereas locally it outputs all of them, which is 62 detail pages. (This means a total of 57 lines on morph and 70 locally).

henare commented 8 years ago

I just ran the PlanningAlerts scraper and the run finished OK but there was an error with the scraper:

Injecting configuration and compiling...
       -----> Ruby app detected
-----> Compiling Ruby
-----> Using Ruby version: ruby-2.2.4
-----> Installing dependencies using bundler 1.11.2
       Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment
       Fetching gem metadata from https://rubygems.org/.........
       Fetching version metadata from https://rubygems.org/..
       Fetching https://github.com/openaustralia/scraperwiki-ruby.git
       Installing mime-types 2.99.1
       Installing unf_ext 0.0.7.2 with native extensions
       Installing httpclient 2.7.2
       Installing net-http-digest_auth 1.4
       Installing net-http-persistent 2.9.4
       Installing mini_portile2 2.0.0
       Installing ntlm-http 0.1.1
       Installing webrobots 0.1.2
       Using bundler 1.11.2
       Installing sqlite3 1.3.11 with native extensions
       Installing nokogiri 1.6.7.2 with native extensions
       Installing unf 0.1.4
       Installing domain_name 0.5.20160310
       Installing http-cookie 1.0.2
       Installing sqlite_magic 0.0.6
       Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)
       Installing mechanize 2.7.4
       Bundle complete! 2 Gemfile dependencies, 17 gems now installed.
       Gems in the groups development and test were not installed.
       Bundled gems are installed into ./vendor/bundle.
       Bundle completed (25.02s)
       Cleaning up the bundler cache.
       
       ###### WARNING:
       You have not declared a Ruby version in your Gemfile.
       To set your Ruby version add this line to your Gemfile:
       ruby '2.2.4'
       # See https://devcenter.heroku.com/articles/ruby-versions for more information.
       
       -----> Discovering process types
       Procfile declares types -> scraper
Injecting scraper and running...
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=1
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=2
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=3
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=4
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=5
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=6
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=7
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?std=29/08/2016&end=12/09/2016&page=8
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261516
Skipping already saved record TP-2005-811/B
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261515
Skipping already saved record TP-2005-815/B
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261490
Skipping already saved record TP-2016-757
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261493
Skipping already saved record TP-2016-758
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261496
Skipping already saved record TP-2016-759
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=261498
Skipping already saved record TP-2016-760
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238088
Skipping already saved record TP-2016-755
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238090
Skipping already saved record TP-2016-756
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237816
Skipping already saved record TP-1999-1133/A
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237809
Skipping already saved record TP-2016-743
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237812
Skipping already saved record TP-2016-744
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237814
Skipping already saved record TP-2016-745
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237828
Skipping already saved record TP-2016-746
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237832
Skipping already saved record TP-2016-747
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237834
Skipping already saved record TP-2016-748
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237984
Skipping already saved record TP-2016-749
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238043
Skipping already saved record TP-2016-750
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238052
Skipping already saved record TP-2016-751
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238053
Skipping already saved record TP-2016-752
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238055
Skipping already saved record TP-2016-753
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=238056
Skipping already saved record TP-2016-754
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237404
Skipping already saved record TP-2016-740
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237410
Skipping already saved record TP-2016-741
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=237707
Skipping already saved record TP-2016-742
Fetching http://www.melbourne.vic.gov.au/building-and-development/property-information/planning-building-registers/Pages/town-planning-permits-register-search-results.aspx?appid=316
/app/vendor/bundle/ruby/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:1166:in `auto_io': Cannot allocate memory @ io_write - /tmp/mechanize-gunzip20160912-18-zk96as (Errno::ENOMEM)
    from /app/vendor/bundle/ruby/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:434:in `content_encoding_gunzip'
    from /app/vendor/bundle/ruby/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:818:in `response_content_encoding'
    from /app/vendor/bundle/ruby/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:283:in `fetch'
    from /app/vendor/bundle/ruby/2.2.0/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get'
    from scraper.rb:30:in `block in <main>'
    from scraper.rb:28:in `each'
    from scraper.rb:28:in `<main>'
henare commented 8 years ago

Cannot allocate memory @ io_write

There's plenty of disk and ram on the server so I don't know what this is about.

henare commented 8 years ago

Here's an idea - could these scrapers be exhausting their container's RAM and that's what's causing the weird error message? We might be getting odd error messages because of the log limiter wrapper that's relatively recently been implemented.

henare commented 8 years ago

Ah ha! I just deliberately created a scraper that would exhaust memory and, what do you know, the output oddly stops and the run fails with status code 255:

https://morph.io/morph-test-scrapers/morph-hansard

@tmtmtmtm do you think that could be the problem you're seeing?

tmtmtmtm commented 8 years ago

@henare it's certainly possible that this is a memory issue (though in other scrapers with memory issues, morph simply kills the process). I could try lowering the batch_size argument to see if it makes any difference, but I need morph to come alive again to do that…

Running it a second time resulted in a empty database.

The scraper explicitly clears out the database before each run, so that part is to be expected if the scraper fails.

when we run this in a local instance of morph the scraper appears to loop. This seems to be a problem with the scraper, not morph. Are you able to get to the bottom of that?

Nope. That's definitely not something the scraper is doing. As you can see from the scraper code the warn "2013 election", for example, is at the very top of the file: https://github.com/everypolitician-scrapers/argentina-diputados-wikidata/blob/master/scraper.rb#L6, not within any sort of loop. Something about the way morph is running it must be triggering a restart before it errors out.

henare commented 8 years ago

I need morph to come alive again to do that

morph is... not alive?! Oh brother.

It looks like the CPUs have been pegged for the last 12+ hours:

morph_io

Console:

selection_001

I've issued a reboot from the Linode dashboard and the site's back. I guess it will take ages to clear the scraper queue though :disappointed:

henare commented 8 years ago

I guess it will take ages to clear the scraper queue though

I just ran the test scraper and it's working.

walinchus commented 7 years ago

Is there any way around this issue? I keep having a scraper give me this code and my guess is I'm overloading the RAM. Can I add breaks or something? Thanks.

henare commented 7 years ago

Hi @walinchus :wave: please ask questions related to scrapers in the forum :point_right: https://help.morph.io/

walinchus commented 7 years ago

Ah no problem thanks @henare!