Open kelson42 opened 9 months ago
Would that be possible using one of the extension dedicated to that task in the browser during the crawling?
@benoit74 A small feedback about the feasibility would be welcome :)
I'm not sure about how this could work. AFAIK, extensions are manipulating the DOM and/or adding custom CSS, so this will not help since the crawler is recording HTTP responses, not the DOM. What we need is more probably something rewriting the HTML and/or JS and/or CSS to remove these banners. Or maybe just some additional JS running on all pages and doing the same as extension. I need to spend time looking at how these extension work in more details, and how we can integrate this.
What you can do is use one of the lists here: https://easylist.to/ probably the Cookie Privacy List to inject css or exclude certain matching resources outright.
We've started doing that in wabac.js for ads, since many are getting removed at crawl time as Brave uses these as well, but the replay attempts to load / doen't hide the space for the ads. (Current implementation: https://github.com/webrecorder/wabac.js/blob/main/src/adblockcss.js#L30)
Haven't tested the cookie popups as much as ads so far, though.
The rules are explained here:
https://help.adblockplus.org/hc/en-us/articles/360062733293-How-to-write-filters
but its not as complicated as it seems, as most of the rules are either css selectors (ones that contain ##
) or URL patterns that should be excluded. Currently, we're only injecting css selectors with display: none
.
Would be curious to see how this works for you if you try this out!
Thank you a lot Ilya! I will have a look
It would be nice to have an option removing automatically the cookie banners. They are annoying and don't make sense offline.