rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.71k stars 123 forks source link

wait_for_idle cause Ferrum::TimeoutError when visit same url again in ruby retry clause. #226

Closed zw963 closed 2 years ago

zw963 commented 2 years ago

I have code like this:

def retry_timeout(seconds, waiting_for_if:)
  raise 'waiting_for_if must be a Proc object' unless waiting_for_if.is_a? Proc

  tries = 0
  begin
    tries += 1
    Timeout.timeout(seconds) do
      while waiting_for_if.call
        sleep 1
      end
    end
  rescue TimeoutError
    logger.info description if description.present?
    logger.info "Timeout after waiting #{seconds} seconds, Retried #{tries} times."
    retry
  end
end

waiting_for_if_proc = proc do
  browser.goto 'some_url'
  browser.network.wait_for_idle
  # do many crap 
  sleep 100 # wait timeout happen.
end

Then load it use:

begin
  retry_timeout 55, wait_for_if: wait_for_if_proc
ensure
  browser.quit
end  

It works when first time scrap, but, if timeout happen, when goto some_url again, wait_for_idle will keep block current network request, util Ferrum::TimeoutError happen, and raise error like this:

gems/ferrum-0.11/lib/ferrum/network.rb:30:in `wait_for_idle': Timed out waiting for response. It's possible that this happened because something took a very long time (for example a page load was slow). If so, setting the :timeout option to a higher value might help. (Ferrum::TimeoutError)

But after remove all browser.network.wait_for_idle, all works quite well.

Thank you.

zw963 commented 2 years ago

BTW, ask another question, i saw some methods like: browser.reset, browser.network.clear(:traffic), if any of those method should be invoke to release memory before goto 'some_url' again in the retry caluse?

Mifrill commented 2 years ago

@zw963 you can try play with options for wait_for_idle method:

wait_for_idle(connections: 0, duration: 0.05, timeout: @page.browser.timeout)

browser.reset, browser.network.clear(:traffic), if any of those method should be invoke to release memory

Sort of, it could be, however, I don't think your case related to memory lack.

browser.reset need to close browser tabs, so if you have many opened tabs (possibly by script on-page that opens tabs automatically) it could be a reason for many connections.

browser.network.clear(:traffic) need to clear browser's cache or collected traffic, but in this case, we have an error right on the next visit same URL again, so it shouldn't be related to memory lack due to traffic.

Anyway, need to play with the website to see closer and analyze the reasons. It could be fab if you are able to provide more details and the source as well, let's convert it to a discussion and proceed with it.