Cannot get Cuprite working on Heroku CI

matpowel commented 4 years ago

Great project, I love the idea of using this project instead of the old Selenium way.

In short, we've got everything working on local dev machines (except drag/drag tests, but that's documented and hopefully will be implemented at some point) but it fails hard on Heroku CI with "dead browser" type errors. Here is some info:

An example error I get is:

Ferrum::StatusError: Request to http://127.0.0.1:43864/ reached server, but there are still pending connections: http://127.0.0.1:43864/assets/application-.... etc, lots of assets and some Google fonts in the list

Then below that:

Ferrum::DeadBrowserError: Browser is dead or given window is closed

We've tried without the no-sandbox option, then we get a different error about the web socket URL returning nothing. We've also tried toying with various different timeout settings. Current settings are:

Capybara.register_driver(:cuprite) do |app|
  browser_options = {}.tap do |opts|
    opts['no-sandbox'] = nil if on_heroku_ci?
  end

  options = {}.tap do |opts|
    opts[:browser_path] = ENV['GOOGLE_CHROME_BIN'] if ENV['GOOGLE_CHROME_BIN'].present?
    opts[:browser_options] = browser_options # { 'no-sandbox': nil },
    opts[:window_size] = [1600, 1280]
    # Increase Chrome startup wait time (required for stable CI builds)
    opts[:process_timeout] = 10
    # Enable debugging capabilities
    opts[:inspector] = !on_heroku_ci?
    # Allow running Chrome in a headful mode by setting HEADLESS env
    # var to a falsey value
    opts[:headless] = !ENV['HEADLESS'].in?(%w[n 0 no false])
    opts[:timeout] = 30
  end

  Capybara::Cuprite::Driver.new(app, options)
end

Any ideas on what the cause could be? This is super hard to debug on CI.

route commented 4 years ago

The machine you use on CI is slower then your local machine and the reason for tests to fail. Try https://github.com/rubycdp/cuprite#url-blacklisting--whitelisting to block all external network requests, because with these your tests will be slower, try to compile assets in advance and turn debugging off to make the size smaller and compile everything into 1 file instead of n. These are thumbs up rules for all feature tests, and you should see your tests passing.

matpowel commented 4 years ago

Hi, thanks for the response.

As you can see from my settings, I have a timeout of 30 seconds, I've also tried 120 seconds. Our entire test suite (hundreds of tests) runs in about 250 seconds on the same Heroku instance with Selenium/Chromedriver so I really don't think timeout could possible be an issue? Is it the process_timeout or timeout that affects how long it waits to load related links on a given page? Why is this an issue for Cuprite/Ferrum and not Selenium/Chromedriver?

Regarding pre-compiling, I think pre-compiling all your assets in tests is a bad habit in Rails tests. Sometimes you only want to run one test, or only unit tests etc.

Regarding whitelisting, the point of system tests / features is to test the "real world end user" behavior of the app, excluding some CSS and JS seems a little unhelpful in achieving this goal, especially when it causes zero problems outside Cuprite/Ferrum.

Are you sure enough that there is nothing else going on here that you've closed the issue? I can't imagine any Ruby on Rails project would ever work with Cuprite on default settings, wouldn't this be concerning for adoption of this project?

Matt

route commented 4 years ago

I highlighted the points for you to check. If you see a lot of pending connections to assets as well as external network requests don't you think that this is the reason why your test cannot move further? Do you want to wait forever until google responds or go ahead and start clicking around without styles and js properly loaded? I don't think so.

As for other projects I myself use it on a project with huge amount of tests, cuprite itself has tests and many people around here use it just fine. This is project dependent problem don't blame library for all the issues. I'm sorry but If you can't help yourself with the suggestions above (at least trying them), me neither.

matpowel commented 4 years ago

I've tried all of the suggestions except asset pre-compilation and nothing helps. I've set both process_timeout and timeout to 200 seconds and it doesn't help, same error. There is no timing issue here. With the timeouts set to 200 it actually took close to 7 or 8 minutes to fail, I'm guessing maybe 200 + 200 seconds.

The fact that it works for so many people makes me think it's a setup issue, most likely something to do with trying to contact the wrong host/port or something but I can't see anything obvious.

Given so many people have this working, do you know if any of them are using Heroku CI? Or other cloud CI services?

route commented 4 years ago

process_timeout is when you start Chrome process and wait until it does or raises an error. timeout is what you need in general but anything that takes longer than 10-30 seconds is fishy.

Could you run test with CUPRITE_DEBUG=true env variable and attach the log, on Circle CI we use something like this to store the artifact with debug output for failing test.

CUPRITE_LOGGER = StringIO.new
Capybara.server = :webrick
Capybara.javascript_driver = :cuprite
Capybara.register_driver :cuprite do |app|
  options = {
    window_size: [1200, 800],
    process_timeout: 5,
    timeout: 10
  }
  options.merge!(logger: CUPRITE_LOGGER) if ENV['CI']

  driver = Capybara::Cuprite::Driver.new(app, options)
  process = driver.browser.process
  puts ""
  puts "Browser: #{process.browser_version}"
  puts "Protocol: #{process.protocol_version}"
  puts "V8: #{process.v8_version}"
  puts "Webkit: #{process.webkit_version}"
  driver
end

  config.before(:each) do
    if ENV['CI']
      CUPRITE_LOGGER.truncate(0)
      CUPRITE_LOGGER.rewind
    end
  end

Please remove opts[:inspector] = !on_heroku_ci? this should be off on CI. Check your code if it has .debug or .pause calls as this will cause Chrome to wait indefinitely until we connect. Anyways I need to see log from your CI for at one failing test. Maybe comment all of them and leave only one.

matpowel commented 4 years ago

Ok I spent a long time on this and I think we got it mostly worked out.

The primary culprit seems to be the attempt to load anything over https, in our case material design icons and some fonts from Google CDN.
First we tried adding ignore-certificate-errors but that didn't seem to help.
Next we added whitelisting and it seemed to help but certain tests failed, rather confusingly one of them was a test that adds a cookie to set user timezone info from the browser and checks that the server renders the correct result. There should be nothing it does that needs to call out beyond localhost but switching on the whitelist to localhost and 127.0.0.1 only broke that test. That one will have to be filed in the X-files unless it ever comes up again.
Finally we went all out and switched our app around to serve them up through the asset pipeline and that has fixed the tests, hoorah.
We did have to comment out a couple of tests that were testing drag/drag using .drag_to.
See our final config below. We set the proc timeout to 10 and timeout to 60 for Heroku CI and it seems to work comfortably.
Note that in regards to your comment on inspector, I think you're misreading it. "on_heroku_ci?" returns true when on Heroku CI, so inspector will be false whenever we run in CI. Anyway I commented it out just to be safe.

Thanks for your help, adding CUPRITE_DEBUG=true allowed us to track down the https which we then realized was probably the call that was getting stuck permanently.

Capybara.register_driver(:cuprite) do |app|
  browser_options = {}.tap do |opts|
    opts['no-sandbox'] = nil if on_heroku_ci?
    opts['ignore-certificate-errors'] = nil
  end

  options = {}.tap do |opts|
    opts[:browser_path] = ENV['GOOGLE_CHROME_BIN'] if ENV['GOOGLE_CHROME_BIN'].present?
    opts[:browser_options] = browser_options # { 'no-sandbox': nil },
    opts[:window_size] = [1600, 1280]
    # Increase Chrome startup wait time (required for stable CI builds)
    opts[:process_timeout] = ENV.fetch('CUPRITE_PROCESS_TIMEOUT', 5).to_f
    # Enable debugging capabilities
    #opts[:inspector] = !on_heroku_ci?
    # Allow running Chrome in a headful mode by setting HEADLESS env
    # var to a falsey value
    opts[:headless] = !ENV['HEADLESS'].in?(%w[n 0 no false])
    opts[:timeout] = ENV.fetch('CUPRITE_TIMEOUT', 30).to_f
    #opts[:url_whitelist] = ['localhost', '127.0.0.1']
  end

  Capybara::Cuprite::Driver.new(app, options)
end

Matt

route commented 4 years ago

Wooohoo I'm glad you solved it!

1) Regarding drag & drop, it's on my list to implement it, should be relatively easy but I just need to find time. 2) Timeout 60s still seems too high in my opinion, if it helps to get rid of intermittent tests then I think there's one more underlying issue. I can't imagine waiting for response locally on the dev machine for more than 30s, even 30 sometimes too much. 3) Yea you are right I misread on_heroku_ci?

matpowel commented 4 years ago

I'll see if we can allocate some time to helping, but it looks like it's still TBD over at Ferrum? I tried the method the guy suggested on issue #67 but it doesn't work. Does CDP support drag natively or would we need to simulate the mouse controls?
Dropped to 30s and it still works, not sure why it didn't previously. Having said that, this is just a timeout right? Why does it matter if it's too high? It shouldn't ever hit the timeout unless loading assets genuinely takes that wrong right? Maybe I'm misunderstanding.

route commented 4 years ago

Not that I'm aware of, we can simulate it. There was also a bug https://bugs.chromium.org/p/chromium/issues/detail?id=850071 not sure if it's solved.
It is a timeout that's used not only for visit but for all commands you send to Chrome. We start the browser, and send a command, Chrome can crash or something may happen, so there must be a timeout we wait for the response. If it's set to 30 and everything is fine you wouldn't notice it, if something's wrong your build is super slow. Thus I recommend setting it from 10 to 15 and in case of any issues investigate those issues rather than increase timeout.

rubycdp / cuprite

Cannot get Cuprite working on Heroku CI #126