stefanzweifel / screeenly

📸 Screenshot as a Service
https://secure.screeenly.com
MIT License
492 stars 102 forks source link

how to eliminate cookie consent popups #371

Open mortee opened 1 year ago

mortee commented 1 year ago

Is your feature request related to a problem? Please describe.

Most sites nowadays open with a cookie consent popup, blocking the page content. This is what's captured on the screenshot.

Describe the solution you'd like

A way to have those eliminated before the shot is captured.

Describe alternatives you've considered

There are add-ons that do this, but AFAIK addons are not supported by chrome in headless mode.

Here's some insight about how one might go about this issue with puppeteer.

At the very least, I should be able to access specific pages somehow through screeenly's browser, dismiss those popups manually, and the cookies that manifest those choices be preserved and reused for screenshots.

stefanzweifel commented 1 year ago

That's a good question @mortee; unfortunately I don't have the solution for it.

Injecting JavaScript code to find and click on buttons seems like the most straight forward solution. Even though tricky, as how to make it work for every cookie banner available.

screeenly uses spatie/browsershot under the hood to interact with Puppeteer. As far as I know, browsershot currently doesn't support executing arbitrary JavaScript code. So we first would have to contribute that feature.

I also currently don't see a neat way to integrate this into the web API. An API is usually "interaction less". I don't see how we could allow users to interact with the headless browser through a web API. 🤔

What I've seen users of 3.screeenly.com do, – when they control the website they want to screenshot – is pass a query parameter to the website to not execute the cookie banner. Something like https://example.com/?cookie_banner=false.

mortee commented 1 year ago

Since the API is accessed using a key associated with an account, the user could actually interact with the browser that would later be used through the API, using the corresponding key.

stefanzweifel commented 1 year ago

I think this isn't as easy as you might think. How would you describe "interact with the browser"?

When you make an API request, should a unique URL be generated, which you as a user could click on, which would direct you to the browser with the website you want to screenshot opened? (One would for example see the rendered Chromium browser in a Firefox browser?)

This would mean that the execution of the Puppeteer command would have to be halted / stopped / paused, until the user "interacts" with the browser or otherwise resume the Puppeteer command.

For how long would that "session" be open? For a minute? 5 minutes? Forever? (From experience I can tell you, that running Chromium processes on servers is expensive.)

In screeenly, Puppeteer is run in headless mode, meaning no GUI of Chromium is being rendered and interaction is not possible. I have not the faintest idea if this even possible to accomplish with Puppeteer and then how to integrate this into an app like screeenly.

Besides, API requests are ususall done from a server in the backend. How would a user then interact with the browser, if the API request is made in the backend?


Depending on your use case, you might want to use/build your own app with Puppeteer or checkout a different tool that can interact with browsers. For example playwright.dev is an excellent tool to manipulate browsers. It also supports generating screenshots: https://playwright.dev/docs/screenshots

mortee commented 1 year ago

Honestly, I just started using sreeenly because it's supported by Nextcloud out of the box as a backend for generating previews for bookmarks, and it can be self-hosted. But unfortunately, almost all my bookmark thumbnails end up looking like this:

IMG_20230225_172813

What I was speculating about was that users could maybe interact with the browser while using the site interactively using the UI, and when making API calls, then only the resulting cookie store would be reused. But you might be right, that could proove to be nigh impossible.

I'm not sure about the add-on thing though. Is it correct that chromium can't use them is headless mode? Because there are multiple cookie-popup preventing add-ons.

stefanzweifel commented 1 year ago

Thanks for the explanation on how you use screeenly. Almost forgot that Nextcloud once used it in a project.

Chrome extensions currently only work in non-headless mode and are hidden behind an experimental flag: https://pptr.dev/guides/chrome-extensions

So maybe in the future, we can use such Cookie-Banner-Blocker extensions in screeenly.

mortee commented 1 year ago

I guess this is beyond your interests, but I'll give it a shot. Might it be possible to gather the cookie store with the popups dismissed on a couple of sites in desktop chromium, and then carry that over to the one screeenly uses? (Which happens to be the apt-installed instance, 'cause puppeteer doesn't care for ARM systems)

stefanzweifel commented 1 year ago

@mortee This version of screeenly currently doesn't support manipulating cookies at all.

However, the underlying library I use allows to pass cookies to Puppeteer. https://spatie.be/docs/browsershot/v2/miscellaneous-options/using-cookies

A couple of years ago I've released a new paid version of screeenly that supports that feature. You could test your theory in the Playground. There you can add cookies through a UI.

Here's an example: Link.

This is obviously not a solution to your problem. v3 of screeenly is not compatible with NextCloud (they would have to update their code) and there's currently no free plan. And I don't expect non-companies to ever pay for that service.