rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.71k stars 123 forks source link

Frequently get "Could not find node with given id (Ferrum::NodeNotFoundError)" for .css method. #202

Closed zw963 closed 2 years ago

zw963 commented 2 years ago

I use one line code like this:

ipo_question = page.css('h3.question').find {|x| x.text.match?(/When did .* IPO/i) }

It frequently get following error: (but not always)

/home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/browser/client.rb:87:in `raise_browser_error': Could not find node with given id (Ferrum::NodeNotFoundError)
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/browser/client.rb:48:in `command'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/page.rb:160:in `command'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:179:in `handle_response'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:184:in `block in handle_response'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:220:in `block in reduce_props'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:218:in `each'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:218:in `reduce'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:218:in `reduce_props'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:182:in `handle_response'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:144:in `block in call'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum.rb:145:in `with_attempts'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:124:in `call'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/runtime.rb:80:in `evaluate_func'
        from /home/deployer/apps/marketbet_crawler_production/shared/bundle/ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/frame/dom.rb:76:in `css'
        from /home/deployer/.rvm/rubies/ruby-3.0.2/lib/ruby/3.0.0/forwardable.rb:238:in `css'

I never use id in #css method anyway, when i try to reproduce this issue with some scrap code. page.css('aaaaaaaaaaa'), it never raise this error, just return a empty array.

[5] pry(#<IpoParser>)> page.css('aaaaaaaaaaaaaaaaaaa')                                                                                                                                        
[]

So, it is a little wired for this research this issue ...

Following is my code sample:


instance = Ferrum::Browser.new(
      logger: MyLogger.new(Logger.new('chrome_headless.log', 10, 1024000)),
      headless: true,
      pending_connection_errors: false,
      window_size: [1024, 768],
      browser_options: { 'no-sandbox': nil, 'blink-settings' => 'imagesEnabled=false' })

Thread.new(instance) do |browser|
context = browser.contexts.create
          page = context.create_page
page.goto(url)
page.network.wait_for_idle(timeout: 30)
page.css('h3.question').find {|x| x.text.match?(/When did .* IPO/i) }
end
Mifrill commented 2 years ago

@zw963 try to play with env var FERRUM_INTERMITTENT_SLEEP https://github.com/rubycdp/ferrum/blob/006c5fc385596c218402de8b0d24dbe6739f1c72/lib/ferrum/frame/runtime.rb#L17

ttilberg commented 2 years ago

Could not find node with given id (Ferrum::NodeNotFoundError)

I never use id in #css method anyway

Note that this is not referring to a CSS #id, but instead an ID for the object in CDP. I'll probably misspeak some terminology here, but basically each reference to something in CDP has an ID that Ferrum will communicate back and forth with. Sometimes an object in the browser changes, and that ID is no longer valid, such as on a DOM change. You have the page.network.wait_for_idle which should help with that, but there might be more to it.

Check out the comments in this file and this area here

zw963 commented 2 years ago

I set FERRUM_INTERMITTENT_SLEEP for 0.5 on my production, let me test for a while if it works.

zw963 commented 2 years ago

It seem like works, thank you.