rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.74k stars 123 forks source link

browser.quit doesn't close session #418

Closed fernandomm closed 10 months ago

fernandomm commented 10 months ago

I'm using Ferrum 0.14 + Ruby 3.2.2 with Browserless. Browserless not only makes it simple to run Chrome headless but it also offers some extras like checking which sessions are active and what it's doing.

So, when I start it there are no sessions:

[1] pry(main)> JSON.parse(RestClient.get('http://chrome:3333/sessions'))
=> []

I start Ferrum and can see that the session is now showing up:

[2] pry(main)> browser = Ferrum::Browser.new(url: 'http://chrome:3333')
=> #<Ferrum::Browser:0x0000ffff7f309e70

[3] pry(main)> JSON.parse(RestClient.get('http://chrome:3333/sessions'))
=> [{"description"=>"",
  "devtoolsFrontendUrl"=>"/devtools/inspector.html?ws=0.0.0.0:3333/devtools/page/A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "id"=>"A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "title"=>"about:blank",
  "type"=>"page",
  "url"=>"about:blank",
  "webSocketDebuggerUrl"=>"ws://0.0.0.0:3333/devtools/page/A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "port"=>"33131",
  "browserId"=>"e2bb3742-d16a-4c94-931a-29336a8153cf",
  "trackingId"=>nil,
  "browserWSEndpoint"=>"ws://0.0.0.0:3333/devtools/browser/e2bb3742-d16a-4c94-931a-29336a8153cf"}]

But when i run browser.quit, it doesn't close the session, leaving it open forever:

[4] pry(main)> browser.quit
=> nil

[5] pry(main)> JSON.parse(RestClient.get('http://chrome:3333/sessions'))
=> [{"description"=>"",
  "devtoolsFrontendUrl"=>"/devtools/inspector.html?ws=0.0.0.0:3333/devtools/page/A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "id"=>"A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "title"=>"about:blank",
  "type"=>"page",
  "url"=>"about:blank",
  "webSocketDebuggerUrl"=>"ws://0.0.0.0:3333/devtools/page/A87609DAEBF35E0FF8AF4E0C4FD64D4D",
  "port"=>"33131",
  "browserId"=>"e2bb3742-d16a-4c94-931a-29336a8153cf",
  "trackingId"=>nil,
  "browserWSEndpoint"=>"ws://0.0.0.0:3333/devtools/browser/e2bb3742-d16a-4c94-931a-29336a8153cf"}]

The session is only closed after I close the IRB console and terminate the process.

[6] pry(main)> exit 0
root@c31d1e2b8b58:/myapp# curl http://chrome:3333/sessions
[]

When using Ferrum in a long running process like Sidekiq/Puma, this results in thousands of sessions open since the process is never terminated.

I guess that it might also explain the issues with zombie processes previously reported ( #364 ).

Is browser.close the correct way of terminating the session or am I missing something?

route commented 10 months ago

FIrst of all I cannot reproduce it with docker run -p 3000:3000 browserless/chrome. Second it cannot explain zombie processes because simply there's no process created for Chrome, since it's running in Docker. What can happen when you call browser.quit is we simply close websocket connection to browserless, if you want to close the tab opened, just close it or dispose the whole context.

fernandomm commented 10 months ago

Thanks for the reply. I was able to dedicate more time and check why it works for you and wasn't work for me.

The issue seems to happen when using Docker, more specifically the internal network. Here is a repo to quickly reproduce the issue https://github.com/fernandomm/ferrum418#readme

Basically when I connect using the internal service name ( browserless ), it fails:

$ docker-compose exec app bash -l -c 'bundle exec rails runner /rails/bug.rb http://browserless:3000'
Number of sessions (initial): 0
Number of sessions (before browser.quit): 1
Number of sessions (after browser.quit): 1

But if I connect to the port that is exposed by Docker at the host, it works. In this case I'm using Docker for Mac, but I was also able to reproduce it in Linux and Docker Swarm.

$ docker-compose exec app bash -l -c 'bundle exec rails runner /rails/bug.rb http://host.docker.internal:3000'
Number of sessions (initial): 0
Number of sessions (before browser.quit): 1
Number of sessions (after browser.quit): 0

I understand that this issue is related to Docker/network and may not be related to the gem. But I'm trying to investigate it further although I have no experience with the CDP protocol.

Do you have any suggestions or helpful tips on what I should look into?

Thanks again.

route commented 10 months ago

I'm not sure if there is any difference between docker run -p 3000:3000 ghcr.io/browserless/chrome and docker run -p 3000:3000 browserless/chrome because I personally don't use browserless, but looking at the output there's. I think it's not docker issue at all, it's implementation of browserless, whatever sits in front of chrome can close or not session. I don't think that you should just simply disconnect, just dispose the whole context or close the page page.close before moving on, this would be a correct behavior instead of just diconnecting.

Don't expect browser.quit to do any job, because it kills only chrome it spawned before. Suggestion is, before disconnect, close the page ;)

fernandomm commented 10 months ago

I tried to use page.close but it didn't made any difference. It still left "about:blank" sessions open which were only terminated after Browserless's CONNECTION_TIMEOUT is reached.

In one of my tests, I added a simple nginx proxy in front of browserless/chrome container. After that the issue went away.

Now browser.quit works as expected inside docker and closes the session immediately.

I don't know what nginx does differently but, since I have dedicated more time than expected to this issue, I will just accept and use it :)

Thanks a lot for the help. I'm leaving the nginx conf below in case someone experience a similar error.

upstream browserless {
  zone upstream_dynamic 64k;
  server browserless:3000;
}

server {
  proxy_next_upstream error timeout http_500 http_503 http_429 non_idempotent;
  listen 80;

  location / {
    proxy_pass http://browserless;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
    proxy_connect_timeout 10;
    proxy_send_timeout 900;
    proxy_read_timeout 900;
    send_timeout 900;
  }
}
sloanesturz commented 7 months ago

Hello! I have almost the exact same issue. I changed my code to call .command('Browser.close') at the end of its use. This seems to really shut down the connection on Browersless's side -- instead of waiting for the long timeout.

def with_browser(&block)
  browser =
    Ferrum::Browser.new(url: MY_BROWSERLESS_DOCKER_URL)

    results = block.call(browser)

    browser.command('Browser.close') # this really closes the browser, more than just .quit
    browser.quit

    results
end
route commented 7 months ago

@sloanesturz mind opening a PR with Browser.close command added to browser?

Nakilon commented 3 months ago

the issues with zombie processes previously reported ( #364 ).

Dockerhub link there has rotten. Here is new one: https://docs.docker.com/compose/compose-file/05-services/#init