Closed zw963 closed 2 years ago
Oops, i saw some logs, my process is broken because some Capybara::Cuprite::ObsoleteNode
or Ferrum::TimeoutError
exception, it keep retry. anyway, i want heard the advice from you, thank you.
It seem like the the main reason chrome headless not work correct is caused by chrome headless not work with websocket
ruby/3.0.0/gems/ferrum-0.11/lib/ferrum/browser/process.rb:149:in `parse_ws_url': Browser did not produce websocket url within 10 seconds, try to increase `:process_timeout`. See https://github.com/rubycdp/ferrum#customization (Ferrum::ProcessTimeoutError)
I try add :process_timeout, no luck, it just keep waiting because some elements which depend on websocket connection never appear.
Do you use .reset
sometimes in your scripts? If not it's good idea to start
In general an ideal solution is not to start Chrome and create hundreds thousands of pages in it but instead use short lived session to chrome and then kill it. All of course running in containers with limits on mem and cpu. If this is not the option you should definitely call .reset
after visiting page or at least kill the whole context as in here https://github.com/rubycdp/ferrum#thread-safety
Do you use
.reset
sometimes in your scripts? If not it's good idea to start
No, i don't know what this .reset is means, in fact, i run instance.visit(some_url)
several times when do scrap, you know what i means, open another url use same session. (in fact, just direct to main page after login in).
So, what you advice is, if not use multi-thread mode, i can run instance.reset
at any time, and not lose my session?
You should be able to loose session. The session is only a cookie cookie somewhere, run fresh Chrome instance, set cookie visit as you logged in, voila.
Thank you.
Hi, following is a chrome headless run on my 2GB memory VPS, basically, it runing 2 minutes, and idle(just run sleep) for 4 minutes, But, after running it several hours, only one chrome headless process get consume 600M+ memory, which make my VPS almost broken.
I use capybara + cuprite for this scrap script, i hope can hear some idea, for avoid use too many memory. (BTW: because this script need login, so, login too frequently is not a good solution for this case)
Thank you.