rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.68k stars 119 forks source link

`wait_for_network_idle` failing due to pending requests on previous page #420

Open cbliard opened 7 months ago

cbliard commented 7 months ago

I experienced some random Ferrum::TimeoutError when calling wait_for_idle. For the same navigation, sometime it would fail, sometime it would pass.

After adding debug information output for requestWillBeSent, responseReceived, loadingFinished and loadingFailed events, it looks like some request can be sent while navigating to another page, and never receive a responseReceived, loadingFinished, or loadingFailed event.

For instance:

requestWillBeSent for requestId=349E42C73C6810F5DBFA790494E57AE0 loaderId=349E42C73C6810F5DBFA790494E57AE0 at 73687.356628
requestWillBeSent for requestId=429934.40 loaderId=8FCB3DE497DC83193F1D76075900D0D4 at 73687.362944
requestWillBeSent for requestId=429934.41 loaderId=8FCB3DE497DC83193F1D76075900D0D4 at 73687.452702
responseReceived for requestId=349E42C73C6810F5DBFA790494E57AE0 loaderId=349E42C73C6810F5DBFA790494E57AE0 at 73687.54549

The two requests 429934.40 and 429934.41 are in the @traffic array, but no responseReceived, loadingFinished, or loadingFailed event will be sent for them, so they will still be considered pending. They have a different loaderId because they have been loaded by the previous page doing some asynchronous XHR request for some reason.

When wait_for_idle is called, it will wait for all pending request to finish, even those from no-more displayed pages, making it fail with Ferrum::TimeoutError.

I modified the #pending_connections method a little to consider only pending connections for the current page. I looks somewhat like this:

    def pending_connections
      frame_id = @traffic.first&.request&.frame_id
      current_page_loader_id = @traffic.select { |conn| conn.navigation_request?(frame_id) }.last.request.loader_id
      current_page_traffic = @traffic.filter { |exchange| exchange.request.loader_id == current_page_loader_id }
      current_page_traffic.count(&:pending?)
    end

This fixes my issue.

I am no expert in CDP. Maybe there are some other events that I missed. Is this approach correct?

Chrome version is 119.0.6045.105 on Linux amd64.

route commented 6 months ago

@cbliard do you have an example of such a page in the public internet? I just want to check the CDP logs to better understand of what's going on