rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.74k stars 123 forks source link

Disable rescue & retry in `Ferrum::Frame::Runtime#call` on existing nodes #360

Open stevschmid opened 1 year ago

stevschmid commented 1 year ago

Cuprite/Ferrum is an amazing project, thank you for all the hard work! Our migration from Selenium so far has been painless. One spec gave us troubly though. The spec tests a report page which contains a table. The spec clicks on filters which causes the page to re-fetch the table and replace it. After each click, we test for the new table content.

click_on 'Filter'
expect(page).to have_selector(:td, text: 'Filtered Foo')

This heavy spec generally takes around 15 seconds to execute. Yet, with Cuprite, it fails half the time ("could not find td with text: 'Filtered Foo'"). Although increasing the Capyara.default_max_wait_time to 30 seconds makes make this spec more robust, it also occasionally increases the runtime to over a minute. Given that the AJAX request to fetch the new table only takes a few milliseconds, the network requests does not appear to be the cause of this sporadic slowdown issue.

I tracked the cause down to the retry in Ferrum::Frame::Runtime#call https://github.com/rubycdp/ferrum/blob/ec6d9e5eee66a30b93509ef6a6bee359b84dc295/lib/ferrum/frame/runtime.rb#L103-L105

After Capybara collects all the td nodes, it queries these nodes for a potential text match. If these nodes were replaced in the meantime (i.e. after the collection but before the execution of the match query), the query on each of these stale, non-existent tds would cause the Ferrum::Frame::Runtime#call to retry & finally timeout (roughly 0.5s for each node). Depending on the size of the collection, the query can take multiple seconds (accumulation of timeouts), causing the spec to fail (or a heavy increase in runtime given a high enough Capybara.default_max_wait_time).

In my limited understanding, the retry upon encountering NodeNotFoundError on a known node seems unnecessary. Chrome tells us that the node with the given id is not around anymore (code 32000, "Could not find node with given id"), so a retry will never fix this.

The PR disables the rescuing and retrying when querying on an existing node. With this change, our troublesome spec passes consistently and with a stable runtime.

Please let me know if I've overlooked any potential side effects or if there are additional changes needed, such as more tests.

stevschmid commented 1 year ago

A side note: With the change above, I encountered some flakiness in a few specs using an older version of Turbo. I found some reports about intermittent failing specs with Capybara using Turbo. Turns out, the with_retry had the unexpected benefit to mask this issue (since it introduces a small "sleep" after a node has been removed). Luckily the issue has been resolved. Upgrading from 7.1.0 to the latest version 7.3.0 solved the flakiness.