minbrowser / min

A fast, minimal browser that protects your privacy
https://minbrowser.org/
Apache License 2.0
8k stars 709 forks source link

Using AdGuard to block ads. #687

Closed LukaFX closed 4 years ago

LukaFX commented 5 years ago

I have an application called adguard, it is basicly an adblock but it is a desktop app so it automaticly blocks ads on everything, not just browsers. I decided I would try it on Min and it solve all my problems with the built in adblock, there is no longer empty spaces where an ad would be and there is no longer any popups or anything of the sort. I am supprised it works on Min as it does not support extentions but AdGuard for desktop works!

PalmerAL commented 5 years ago

Interesting, they claim that they do cosmetic filtering (which would indeed solve the issues with blank spaces that you mentioned), but I'm guessing they're probably not injecting code into the browser directly. Maybe they're watching for network requests and modifying the HTML of the page before it even reaches the browser? I guess I'll have to investigate more at some point.

code-hunger commented 5 years ago

I don't really see any issue or questions here 😀 but if you're interested, take a look at the source of a page with an ad and see if turning the program off changes the page source.

remusao commented 5 years ago

Hi @PalmerAL,

I am maintaining a pure-JavaScript adblocking library, which has first-class Electron support and can be integrated with only a few lines of code. It supports both network filtering of requests and script/stylesheet injections (which allows blocking the same ads as a browser extension would do).

Regarding AdGuard, the desktop app is most likely a man-in-the-middle which installs its own certificate on the machine. This would allow intercepting all network traffic and even doing injections on the fly in any app (via modification of out-going/in-going http/https requests).

Best, Rémi

PalmerAL commented 5 years ago

@remusao Thanks for the suggestion! I heard about the Cliqz library a few months ago (after reading your performance comparison), it definitely is a really cool library! Supporting cosmetic filtering would probably be a good thing as well.

To be honest, the main reason that I haven't switched is that I've really enjoyed figuring out how to make our current blocking library faster; in the past year, it's improved by around 5x (and I've gotten it down to around 0.03ms per URL, compared to what I think was around 20ms for the library it's forked from). I think I may spend some more time trying to see if I can optimize it more, although your library is still around 3x faster, so I'm not sure I'll be able to match that.

Also, I've figured for a while that most of the blocking libraries are fast enough that blocking speed wouldn't have much of an effect on overall browser performance, but I don't have any data to support that either. Since you've been including this library as part of Ghostery, I'm guessing you probably have some more insight into this - does making blocking faster actually have a noticeable impact on page load times?

That does sound like a plausible theory regarding AdGuard; an in-browser solution is definitely preferable to that.

remusao commented 5 years ago

@remusao Thanks for the suggestion! I heard about the Cliqz library a few months ago (after reading your performance comparison), it definitely is a really cool library! Supporting cosmetic filtering would probably be a good thing as well.

Thanks! Happy to learn that you found the study and library interesting.

To be honest, the main reason that I haven't switched is that I've really enjoyed figuring out how to make our current blocking library faster; in the past year, it's improved by around 5x (and I've gotten it down to around 0.03ms per URL, compared to what I think was around 20ms for the library it's forked from). I think I may spend some more time trying to see if I can optimize it more, although your library is still around 3x faster, so I'm not sure I'll be able to match that.

Congratulations! This is no small feat! It's actually interesting because the library you used as a base is in the performance study: that's the one used by DuckDuckGo extension currently; and it is indeed very slow. Out of curiosity (and because I already had the code to benchmark DDG's blockers at hand), I added your own optimized version to the benchmarks and I could observe a solid x60 speed-up compared to the original library. On the other hand, it seems that the blocking is not on par with other blockers (min blocked 73608 requests whereas Cliqz blocked 82408 with the same rules); maybe some options are not supported? Or maybe I did something wrong with the code, feel free to have a look and correct me if that's the case.

Also, I've figured for a while that most of the blocking libraries are fast enough that blocking speed wouldn't have much of an effect on overall browser performance, but I don't have any data to support that either. Since you've been including this library as part of Ghostery, I'm guessing you probably have some more insight into this - does making blocking faster actually have a noticeable impact on page load times?

You are right about the fact that speed of most blockers is fast enough that it should not be perceived by users. On the other hand, I think this point needs to be nuanced for a few reasons:

  1. The benchmarks run in a controlled environment where all resources are dedicated to blocking. In a browser, content blocking is only one of the many things happening when pages are loading. So basically different tasks are competing for CPU resources and time spent blocking could be used for something else instead. In this regard it is good to minimize this as much as possible.
  2. To go a bit further, in an extension like Ghostery, content blocking is only one part of the work (maybe the easiest actually). We also bundle something we call anti-tracking which also needs to filter requests and decide for each data-point sent from the browser if it could potentially be used to uniquely identify a user; then remove unsafe ones on the fly. That's another example of multiple tasks competing for resources while users are browsing, pages are loading, etc.
  3. Content blocking can also run on devices with less resources than computers, such as phones. Ghostery runs on Android phones with the same code-base, which means that performance is paramount in this case (some user have very old Android phones with fairly slow CPUs).

So in the end the faster you can make this process, the better. It means that more resources are available for browsing itself. It also means that you can potentially do more (or do as much on different platforms: like phones)! For example if your content blocker is 10x faster, maybe you can enable more rules without impacting performance; or you can deliver the same features on devices with x10 less resources.

PalmerAL commented 5 years ago

That's a good point regarding blocking time! It definitely would matter more when there are limited resources available.

Regarding the benchmark, I think the issue is that parsing the filter list is asynchronous by default, using a callback. Since the benchmark code calls the parse() function and immediately starts checking requests, it's going to check some of the requests before the filter list has fully loaded, so they're only going to be matched against a subset of the filters. I've added an option to make parsing synchronous (dc9080c8c51b09a2ab40f3ab5fa62e06bb1e7c59), so if you use parse(rawLists, parsed, null, {async: false}) it should hopefully work better (and also give a more realistic list parsing time).

Using my benchmarking script (which I think is essentially the same as yours) on the Cliqz dataset, I get 57280 blocked with Min, and 57197 with Cliqz. There are a couple options missing that could explain any remaining differences:

PalmerAL commented 5 years ago

Actually, I broke something in the previous commit, it's fixed in 0b999e9d72752a9214c421b12d3784d3c939761c.

remusao commented 5 years ago

Very nice! I updated the benchmark and the figures looks great. You did an amazing optimization work and I will definitely dig into it to see how you managed to speed-up the original approach so much :D