twalpole / apparition

Capybara driver for Chrome using CDP
MIT License
363 stars 46 forks source link

Page#wait_for_loaded not reliable? #51

Open PandaWhisperer opened 4 years ago

PandaWhisperer commented 4 years ago

Background

I use Apparition to automatically detect the presence of certain JavaScript frameworks on different websites. Basically, I load the site (using #visit), and then I run a script (using #evaluate_script) that tries to obtain the version number (for instance, jQuery.fn.jquery would return the jQuery version on a page that has jQuery installed).

Issue

I've noticed that this method is not always reliable. Specifically, I can sometimes re-run the same script (on a site where jQuery is present) up to three times and get different results each time (either a JavascriptError, nil, or the expected result).

Discussion

I've looked around the source code and noticed that Apparition tries to detect page load by periodically attempting to run JavaScript (in Page#wait_for_loaded), and when it succeed, it assumes the page is loaded.

Before using Apparition, I've used the ChromeRemote gem to interface with Chrome, but for many reasons, Capybara with Apparition seems preferable. However, using the other gem, I was able to detect page loads more reliably by listening for the Page.loadEventFired event.

I'm curious as to why Apparition does not use this approach, especially since a comment in Page#wait_for_loaded seems to indicate that it is known that the current method is not reliable?

twalpole commented 4 years ago

Because the Page.loadEventFired event hasn't been reliable either. PRs with a fully reliable method (that doesn't also hang if something in the page can't be loaded) would be great. One of the other things to consider is what exactly does page "loaded" mean??

PandaWhisperer commented 4 years ago

Ah, I see. I'm assuming that Page.domContentEventFired' corresponds to JavaScript'sDOMContentLoadedevent, andPage.loadEventFiredcorresponds to JavaScript'swindow.onload` event, so probably the same distinction applies here, i.e. the former is fired as soon as the DOM has finished parsing, and the latter fires when all content has finished loading.

I've tried replacing Page.domContentEventFired on page.rb:501 with Page.loadEventFired and changed Page#wait_for_loaded to simply block until the main frame has been marked as loaded (or the timeout is hit), and I was able to run the integration tests successfully, but when I turned on debugging, I noticed that in most cases, that handler isn't even hit, it's Page.frameStoppedLoading on page:482 that marks the frame as being ready. So from my limited understanding of the code, this change does not seem to make a huge difference.

For right now, my solution is using evaluate_async_script to set a up a window.onload handler that resolves the promise with the desired value, for instance:

evaluate_async_script(<<-JS)
    window.addEventListener('load', () => {
        arguments[0]('jQuery' in window ? jQuery.fn.jquery : null);
    });
JS

However, this does not appear to be 100% reliable in all situations either.

andrewr224 commented 4 years ago

Similar situation here. Mostly consistent is the best I can get, here's my solution:

def wait_for_page_load
  Timeout.timeout(Capybara.default_max_wait_time) do
    loop until page.evaluate_script("window.jQuery != undefined")
  end
end
twalpole commented 4 years ago

@andrewr224 using Timeout.timeout (the most dangerous method in Ruby) is asking for issues - Because of the way it works (second thread aborting the main thread) it can leave the network stack in an unpredictable state causing all further tests to fail too. This may be acceptable in your use case, but should definitely not be recommended as a general solution.