openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Capybara with selenium fails on ruby 2.7.6 #1337

Open ianheggie-oaf opened 1 month ago

ianheggie-oaf commented 1 month ago

Having reverted the ruby back to 2.7.6 to get the environment built on morph.io, it now breaks at

REGISTER_FORM_URL = 'https://online.whittlesea.vic.gov.au/s/publicregister'
capybara = Capybara::Session.new(:selenium_chrome_headless)
# error on following line:
capybara.visit(REGISTER_FORM_URL)

when the main branch is reset to ruby_2_7_6 branch of the ianheggie-oaf/city_of_whittlesea_aura_planning_register scraper. Note: ruby 2.7.6 appears to be the latest ruby that can be built - see https://github.com/openaustralia/morph/issues/1336 for ruby 3.1.6 issue.

Error log:

Injecting configuration and compiling...
          -----> Ruby app detected
   -----> Compiling Ruby/Rack
   -----> Using Ruby version: ruby-2.7.6
   -----> Installing dependencies using bundler version 1.15.2
          Running: bundle install --without development:test --path vendor/bundle --binstubs vendor/bundle/bin -j4 --deployment
          Warning: the running version of Bundler (1.15.2) is older than the version that created the lockfile (1.17.3). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.
          Fetching gem metadata from https://rubygems.org/........
          Fetching version metadata from https://rubygems.org/.
          Fetching https://github.com/openaustralia/scraperwiki-ruby.git
          Fetching public_suffix 5.1.1
          Using bundler 1.15.2
          Fetching matrix 0.4.2
          Fetching mini_mime 1.1.5
          Installing matrix 0.4.2
          Installing public_suffix 5.1.1
          Fetching mini_portile2 2.8.7
          Fetching racc 1.8.1
          Installing mini_mime 1.1.5
          Fetching rack 3.1.7
          Installing racc 1.8.1 with native extensions
          Installing mini_portile2 2.8.7
          Installing rack 3.1.7
          Fetching regexp_parser 2.9.2
          Installing regexp_parser 2.9.2
          Fetching httpclient 2.8.3
          Installing httpclient 2.8.3
          Fetching rexml 3.3.8
          Installing rexml 3.3.8
          Fetching rubyzip 2.3.2
          Installing rubyzip 2.3.2
          Fetching websocket 1.2.11
          Fetching addressable 2.8.7
          Installing websocket 1.2.11
          Installing addressable 2.8.7
          Fetching sqlite3 1.6.9 (x86_64-linux)
          Fetching rack-test 2.1.0
          Installing rack-test 2.1.0
          Fetching nokogiri 1.15.6 (x86_64-linux)
          Fetching selenium-webdriver 4.9.0
          Installing sqlite3 1.6.9 (x86_64-linux)
          Installing nokogiri 1.15.6 (x86_64-linux)
          Installing selenium-webdriver 4.9.0
          Fetching sqlite_magic 0.0.6
          Installing sqlite_magic 0.0.6
          Using scraperwiki 3.0.1 from https://github.com/openaustralia/scraperwiki-ruby.git (at morph_defaults@fc50176)
          Fetching xpath 3.2.0
          Installing xpath 3.2.0
          Fetching capybara 3.39.2
          Installing capybara 3.39.2
          Fetching capybara-shadowdom 0.3.0
          Installing capybara-shadowdom 0.3.0
          Bundle complete! 5 Gemfile dependencies, 22 gems now installed.
          Gems in the groups development and test were not installed.
          Bundled gems are installed into ./vendor/bundle.
          Post-install message from rubyzip:
          RubyZip 3.0 is coming!
          **********************
          The public API of some Rubyzip classes has been modernized to use named
          parameters for optional arguments. Please check your usage of the
          following classes:
          * `Zip::File`
          * `Zip::Entry`
          * `Zip::InputStream`
          * `Zip::OutputStream`
          Please ensure that your Gemfiles and .gemspecs are suitably restrictive
          to avoid an unexpected breakage when 3.0 is released (e.g. ~> 2.3.0).
          See https://github.com/rubyzip/rubyzip for details. The Changelog also
          lists other enhancements and bugfixes that have been implemented since
          version 2.3.0.
          Bundle completed (5.22s)
          Cleaning up the bundler cache.
          Warning: the running version of Bundler (1.15.2) is older than the version that created the lockfile (1.17.3). We suggest you upgrade to the latest version of Bundler by running `gem install bundler`.
   -----> Detecting rake tasks

          -----> Discovering process types
          Procfile declares types -> scraper
Injecting scraper and running...
WARN: data_count: Ignoring: no such table: data [returning 0]
INFO: DB has 0 records.
INFO: Visiting website using capybara ...
INFO: Quitting capybara
#0 0x56f5758deb13 <unknown>: unknown error: Chrome failed to start: exited abnormally. (Selenium::WebDriver::Error::UnknownError)
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
    from #1 0x56f5756e5688 <unknown>
    from #2 0x56f575709f1f <unknown>
    from #3 0x56f5757055aa <unknown>
    from #4 0x56f57574064a <unknown>
    from #5 0x56f57573a7a3 <unknown>
    from #6 0x56f5757100ea <unknown>
    from #7 0x56f575711225 <unknown>
    from #8 0x56f5759262dd <unknown>
    from #9 0x56f57592a2c7 <unknown>
    from #10 0x56f57591022e <unknown>
    from #11 0x56f57592b0a8 <unknown>
    from #12 0x56f575904bc0 <unknown>
    from #13 0x56f5759476c8 <unknown>
    from #14 0x56f575947848 <unknown>
    from #15 0x56f575961c0d <unknown>
    from #16 0x7bd190c08184 <unknown>
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/response.rb:55:in `assert_ok'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/response.rb:34:in `initialize'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/http/common.rb:83:in `new'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/http/common.rb:83:in `create_response'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/http/default.rb:104:in `request'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/http/common.rb:59:in `call'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/bridge.rb:619:in `execute'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/remote/bridge.rb:53:in `create_session'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:317:in `block in create_bridge'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:316:in `tap'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:316:in `create_bridge'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:74:in `initialize'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/chrome/driver.rb:35:in `initialize'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:47:in `new'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver/common/driver.rb:47:in `for'
    from /app/vendor/bundle/ruby/2.7.0/gems/selenium-webdriver-4.9.0/lib/selenium/webdriver.rb:88:in `for'
    from /app/vendor/bundle/ruby/2.7.0/gems/capybara-3.39.2/lib/capybara/selenium/driver.rb:83:in `browser'
    from /app/vendor/bundle/ruby/2.7.0/gems/capybara-3.39.2/lib/capybara/selenium/driver.rb:104:in `visit'
    from /app/vendor/bundle/ruby/2.7.0/gems/capybara-3.39.2/lib/capybara/session.rb:280:in `visit'
    from scraper.rb:174:in `main'
    from scraper.rb:217:in `<main>'