rubycdp / ferrum

Headless Chrome Ruby API
https://ferrum.rubycdp.com
MIT License
1.76k stars 127 forks source link

Allow using default browser context #471

Open ibrahima opened 4 months ago

ibrahima commented 4 months ago

This PR adds an option for Ferrum to use the default browser context instead of a created BrowserContext that doesn't have access to the persisted browser state.

I know that historically for testing purposes it's been recommended here to use clean browser contexts for reproducibility, but I think this is a reasonable option for some use cases (that may not be related to testing). Other similar libraries provide such options. For example, Playwright lets you do launchPersistentContext to launch a browser with a persistent context. Puppeteer seems to default to using the default browser context and only creates new BrowserContexts if you manually create them. I don't have experience with these libraries, but I did a lot of digging through their source to see how they handled Target creation and BrowserContext creation.

Fixes https://github.com/rubycdp/ferrum/issues/47

Re: this comment from that thread:

I think the default context that Chrome creates has a lot of limitations like inability to create pages inside it and so on.

I seem to be able to create pages/targets within the default context fine. However, I haven't explored much so there could very well be bugs associated with this change. All I know is that it works for my use case and it would be very useful to have this in the upstream gem. I'm happy to discuss whether this change makes sense or not and I don't mind if the answer is that it doesn't. But at least in my case, the save/load cookies feature from 99cfa84c56e55bb373da48935d08d5a13df1ad27 doesn't solve my use case because the persisted browser state that I'm trying to restore is coming from a browser hosting service (https://www.browserbase.com/), so the only way I can access the state is through the browser's default context.

Thank you for your consideration!

route commented 4 months ago

@ibrahima I'm on vacation currently and will take a look on Monday or sooner if I have time

yann120 commented 4 months ago

Hey @ibrahima, thanks for this PR. This is exactly the feature I am missing to moving to Ferrum! I need to stay logged in on a website, and don't want to log in on the website every 5 minutes

@route have you found time to review this PR ? 🙈 Thanks a lot!

matti commented 2 months ago

@route how about reviewing this?

sebyx07 commented 2 months ago

@yann120 also reusing the chrome data dir will help you

ryanstout commented 2 months ago

Hey @ibrahima. Thanks for working on this! I was trying to add the same functionality.

I'm seeing where if I run the following:

require "ferrum"

browser = Ferrum::Browser.new(
  headless: false,
  use_default_context: true,
)

browser.go_to("https://google.com")

I get:

/Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:52:in `create_target': Ferrum::NoSuchTargetError (Ferrum::NoSuchTargetError)
        from /Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:20:in `default_target'
        from /Users/ryanstout/Sites/arsenal/instagrab_2024/ferrum/lib/ferrum/context.rb:24:in `page'
        from /Users/ryanstout/.asdf/installs/ruby/3.2.2/lib/ruby/3.2.0/forwardable.rb:240:in `page'
        from /Users/ryanstout/.asdf/installs/ruby/3.2.2/lib/ruby/3.2.0/forwardable.rb:234:in `go_to'
        from test.rb:15:in `<main>'

I'm digging in on it, but I wasn't sure if you had seen this. Thanks!

yann120 commented 2 months ago

@yann120 also reusing the chrome data dir will help you

Thanks, I ended up saving the cookies, and reload them, and it works. Maybe the easiest solution :)