Closed gorhill closed 5 years ago
Regarding issue https://issues.adblockplus.org/ticket/2278:
@kzar, @ameshkov
Being able to have a token for regex-based filters would definitely help performance. However trying to programmatically extract a token from a regex-based filter sounds scary to me, too much risk of extracting erroneous tokens.
Suggestion: create a new filter option, token=[...]
, which filter creators can use to assign a predefined token to the filter. The creator of a filter is best placed to figure if and what token will work to store the filter internally.
For example, this filter in EasyList:
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script
Could simply have been written by a filter creator:
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script,token=filenuke
Hey guys! I was thinking about solving this issue a while ago. Even tried to implement a simple token-extracting algorithm. I will post my ideas a bit later though.
Meanwhile, here is a list of known regexp rules:
/^(?![a-z]+\:\/+([^\/\:]+\.(il|com|net)|[\.0-9]+|([^\/\:\.]+\.)*(spot\.im|vine\.co|periscope\.tv|vid\.me|mako\.tools|minidom\.org|jquerymin\.org|logidea\.info|zoomanalytics\.co|firstimpression\.io))\.?([\/\:]|$))^[^\/\:\.]+\:\/+[^\/\:\.]/$third-party,domain=mako.co.il | EasyList Hebrew | https://github.com/AdBlockPlusIsrael/EasyListHebrew |
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google|icdn|auto|sport5|smartair|mysupermarket|blms|linicom)\.co\.il\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.il\.?([\/\:]|$)/$third-party,domain=mako.co.il | EasyList Hebrew | https://github.com/AdBlockPlusIsrael/EasyListHebrew |
/^[a-z]+\:\/+[\.0-9]+([\/\:]|$)/$image,media,object,script,stylesheet,subdocument,third-party,domain=mako.co.il | EasyList Hebrew | https://github.com/AdBlockPlusIsrael/EasyListHebrew |
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(fbcdn|cloudfront|facebook|akamaihd|ctedgecdn|2mdn|uploaditnow|edgesuite|doubleclick|dmcdn|slideshare|advsnx)\.net\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.net\.?([\/\:]|$)/$third-party,domain=mako.co.il | EasyList Hebrew | https://github.com/AdBlockPlusIsrael/EasyListHebrew |
/^(?![a-z]+\:\/+([^\/\:\.]+\.)*(google|facebook|twitter|instagram|youtube|jquery|googleapis|vicomi|twimg|cdninstagram|pinterest|pinimg|giphy|playbuzz|outbrain|ytimg|amazonaws|cloudflare|gstatic|sniperm|dinovich|shortaudition|linkedin|opinionstage|vimeo|vimeocdn|dailymotion|flickr|staticflickr|tumblr|soundcloud|scribd|syteapi|addthis|addthisedge|reddit|disqus|disquscdn|apester|qmerce|taboola|taboolasyndication|google-analytics|googletagservices|googletagmanager|googleadservices|googlesyndication|h-cdn|scorecardresearch|serving-sys|bootstrapcdn|tiviclick|ruchlis|hotjar|flx1|mxpnl|themarker|adnxs|conduit|fourtips|makojs)\.com\.?([\/\:]|$))^[a-z]+\:\/+[^\/\:]+\.com\.?([\/\:]|$)/$third-party,domain=mako.co.il | EasyList Hebrew | https://github.com/AdBlockPlusIsrael/EasyListHebrew |
/quang%20cao/ | ABPVN List | http://abpvn.com/ |
/YanAds/ | ABPVN List | http://abpvn.com/ |
/www/images/ | ABPVN List | http://abpvn.com/ |
/ads-pic/ | Adblock-Persian list | http://ideone.com/K452p |
/eshop-eca/ | Adblock-Persian list | http://ideone.com/K452p |
/eshop98/ | Adblock-Persian list | http://ideone.com/K452p |
/402x192/ | Adblock-Persian list | http://ideone.com/K452p |
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn | ChinaList+EasyList | http://www.adtchrome.com/extension/adt-chinalist-easylist.html |
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com | ChinaList+EasyList | http://www.adtchrome.com/extension/adt-chinalist-easylist.html |
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com | ChinaList+EasyList | http://www.adtchrome.com/extension/adt-chinalist-easylist.html |
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu | ChinaList+EasyList | http://www.adtchrome.com/extension/adt-chinalist-easylist.html |
/NetInsight/text/$domain=~ads.pandora.tv|~opt.mgoon.com | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/omniture/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/NetInsight/html/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/cgi-bin/conad.fcgi/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/acecounter/$domain=~acecounter.com | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/adNdsoft/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/wisenut/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/ad-pay/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/wp-content/plugins/google-analyticator/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/realclick/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/max-banner-ads-pro/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/RealMedia/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/bannerManager/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/autoPage/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/overture/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/wiseAd/euckr/inc/$subdocument | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/NetInsight/js/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/scrap_logs/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/banner_event/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/images/adpresso/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/AdBanner/ | Korean Adblock List | https://github.com/gfmaster/adblock-korea-contrib |
/cdsbData_gal/bannerFile/$image,domain=mybogo.net|zipbogo.net | List-KR | https://list-kr.github.io/ |
/nad/media/ | List-KR | https://list-kr.github.io/ |
/ajrotator/ | Filtros Nauscopicos | http://nauscopio.nireblog.com/cat/filtrado |
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!rodzinnawycieczka)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)[a-zA-Z0-9]{10,}\.wp.pl\// | Adblock polskie reguły | http://certyficate.it/polski-filtr-adblock/ |
/:\/\/(?!biuropodrozy)(?!liveblog)(?!relacje)(?!opinie)(?!zalacznik)(?!magazyn)(?!newsletter)(?!facet)(?!wyleczto)(?!kuchnia)(?!film)(?!moto)(?!gwiazdy)(?!teleshow)(?!finanse)(?!kobieta)(?!dom)(?!pogoda)(?!tech)(?!historia)(?!czat)(?!ksiazki)(?!gryonline)(?!hotele)(?!narty)(?!samoloty)(?!wycieczki)(?!hosting)(?!irlandia)(?!multikurs)(?!casino)(?!foto)(?!tech)(?!www)(?!stg)(?!doladowania)(?!fantasyliga)(?!funduszeue)(?!imperiumstylu)(?!kodyrabatowe)(?!alefolwark)(?!angielski)(?!arenamody)(?!beniamin)(?!bon)(?!bsg)(?!casino)(?!diety)(?!dlaprasy)(?!dlugi)(?!doladowania)(?!dom)(?!dysk)(?!ebiznes)(?!ebooki)(?!empire)(?!fantasyliga)(?!film)(?!fundusze)(?!ogloszenia)(?!orangekinoletnie)(?!rekrutacja)(?!rycerzeiksiezniczki)(?!speedwaymanager)(?!sportowefakty)(?!sportowybar)(?!talesofmagic)(?!ubezpieczenia)(?!warofdragons)(?!wiadomosci)(?!gazetki)(?!gry)(?!horoskop)(?!kalendarz)(?!katalog)(?!khanwars)(?!komiks)(?!konflikty)(?!kontakty)(?!korsarze)(?!kultura)(?!mini)(?!mmho)(?!mobilna)(?!morizon)(?!moto)(?!muzyka)(?!narty)(?!naryby)(?!onas)(?!orangekinoletnie)(?!piraci)(?!poczta)(?!pomoc)(?!praca)(?!profil)(?!programtv)(?!pytamy)(?!rekrutacja)(?!rss)(?!rtvagd)(?!rycerzeiksiezniczki)(?!smeet)(?!speedwaymanager)(?!szkola)(?!szukaj)(?!tech)(?!teleshow)(?!triviador)(?!turystyka)(?!twojeip)(?!ulubiency)(?!warodfragons)(?!wycieczki)(?!zdrowie)(?!zoomumba)(?!topnews)(?!erotyka)(?!dzieci)(?!fitness)(?!gielda)(?!finansomat)(?!biznes)(?!sport)[a-zA-Z0-9]{4,9}\.wp.pl\// | Adblock polskie reguły | http://certyficate.it/polski-filtr-adblock/ |
/commoncfm/images/microsoftxboxone/$domain=buffed.de|gamesaktuell.de|gamezone.de|pcgames.de|videogameszone.de | German filter | http://adguard.com/filters.html#german |
/[a-z0-9]{32,}/$third-party,domain=picshare.ru | Russian filter | http://adguard.com/filters.html#russian |
/[a-zA-Z0-9]{35,}/$script,third-party,domain=bigtorrent.org|bigtorrents.ru|cashtube.ru|cmexota.ru|dreamprogs.net|dsvload.net|ecsebo.ru|enotbox.com|faspiic.ru|imagefile.org|imgpay.ru|kordonivkakino.net|mcdownloads.ru|mega-pic.org|odnopolchane.net|payforpic.ru|pic4cash.ru|pic4you.ru|picclick.ru|picforall.ru|pics-money.ru|pirat-pic.ru|planeta51.com|pronpic.org|prons.org|q32.ru|rustorrents.net|santikov.net|sharezones.biz|torrent-pirat.com|unionpeer.org|uraltrack.net|viewy.ru|xhamster-pic.com | Russian filter | http://adguard.com/filters.html#russian |
/http:\/\/rustorka.com\/[a-z]+\.js/$domain=rustorka.com | Russian filter | http://adguard.com/filters.html#russian |
/http:\/\/rustorka.com\/[a-z0-9]+\.(jpg|gif)/$image,domain=rustorka.com | Russian filter | http://adguard.com/filters.html#russian |
/[a-zA-Z0-9]{35,}/$domain=anime-free.net|cyberpirate.me|imgbum.net|online-porno-hd.ru|tecnomectrani.com | Russian filter | http://adguard.com/filters.html#russian |
/[a-z0-9]{30,}/$script,third-party,domain=free-torrent.org|free-torrents.org | Russian filter | http://adguard.com/filters.html#russian |
/^http://[a-z0-9_]{15,}\.[a-z0-9-]+\.[a-z]{2,}\/.*[a-zA-Z0-9]{100,}/$object-subrequest,domain=wat.tv | Liste FR | http://adblock-listefr.com/ |
/^http://[a-z0-9_-]{10,}\.[a-z0-9-]+\.[a-z]{2,}\/.*?\w{30,}/$~xmlhttprequest,domain=gentside.com|maxisciences.com|ohmymag.com | Liste FR | http://adblock-listefr.com/ |
/content/stargate/$domain=hlamer.ru|kadu.ru|krasview.ru | RU AdList | https://code.google.com/p/ruadlist/ |
/output/index/$third-party,script | RU AdList | https://code.google.com/p/ruadlist/ |
/https?://(?!(mc\.yandex\.ru|www\.google-analytics\.com)/)/$third-party,script,subdocument,domain=massivmebel.by | RU AdList | https://code.google.com/p/ruadlist/ |
/^https?://goodgame\.ru/[a-z0-9]+$/$subdocument,domain=goodgame.ru | RU AdList | https://code.google.com/p/ruadlist/ |
/wp-content/plugins/popup-maker/$domain=info-life.in.ua|intermarium.com.ua|paragraf.net.ua|unn24.com.ua|varota.com.ua | RU AdList | https://code.google.com/p/ruadlist/ |
/^https?://(?!static\.)([^.]+\.)+?fastpic\.ru[:/]/$script,domain=fastpic.ru | RU AdList | https://code.google.com/p/ruadlist/ |
/images/brandings/$image,domain=sc2tv.ru | RU AdList | https://code.google.com/p/ruadlist/ |
/default/vbanners/$domain=noi.md | RU AdList | https://code.google.com/p/ruadlist/ |
/branding/$subdocument,domain=fanserials.tv|kino-filmi.net | RU AdList | https://code.google.com/p/ruadlist/ |
/serial_adv_files/$image,domain=xn--80aacbuczbw9a6a.xn--p1ai|куражбамбей.рф | RU AdList | https://code.google.com/p/ruadlist/ |
/^https?://(?!www\.)([^.]+\.)+?(kordonivkakino\.net|m(ac-torrent-download\.net|oviki\.ru))[:/]/$script | RU AdList | https://code.google.com/p/ruadlist/ |
/popupclick/$popup | RU AdList | https://code.google.com/p/ruadlist/ |
/http://[a-zA-Z0-9]+\.[a-z]+\/.*(?:[!"#$%&'()*+,:;<=>?@/\^_`{|}~-]).*[a-zA-Z0-9]+/$script,third-party,domain=keezmovies.com|redtube.com|tube8.com|tube8.es|tube8.fr|www.pornhub.com|youporn.com | EasyList | https://easylist.github.io/ |
/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.fr | EasyList | https://easylist.github.io/ |
/\.sharesix\.com/.*[a-zA-Z0-9]{4}/$script | EasyList | https://easylist.github.io/ |
/\.filenuke\.com/.*[a-zA-Z0-9]{4}/$script | EasyList | https://easylist.github.io/ |
/^http://m\.autohome\.com\.cn\/[a-z0-9]{32}\//$domain=m.autohome.com.cn | EasyList China | http://abpchina.org/forum/ |
/^http://www\.iqiyi\.com\/common\/flashplayer\/[0-9]{8}/[0-9a-z]{32}.swf/$domain=iqiyi.com | EasyList China | http://abpchina.org/forum/ |
/^http://www\.dnvod\.eu.*?\/[a-z0-9]{9,}\.swf/$domain=dnvod.eu | EasyList China | http://abpchina.org/forum/ |
/^http://www\.tt1069\.com\/(?!bbs)/$script,domain=tt1069.com | EasyList China | http://abpchina.org/forum/ |
/ulightbox/$domain=hdkinomax.com|tvfru.net | RU AdList: BitBlock | https://code.google.com/p/ruadlist/ |
/http://cdn[0-9]\.spiegel\.de/images/image-([^-]+)-[^-]+-[^-]+-(?!\1)[^-]+\.jpg/$image,domain=spiegel.de | EasyList Germany | https://easylist.github.io/ |
Please note the number of rules which are mistakenly made regexp-type.
@gorhill I've not been involved in that issue so far, so just done a quick bit of reading. I might get some things wrong.
While I agree that grabbing a keyword from the regexp seems scary, I'm not sure how the suggested token
option would help. Take your filenuke example, there the automatic keyword would have been "filenuke"
anyway.
Now if you think of a more advanced example which matches one of two possible domains, what would you put for the token
option? If you chose to use parts of one of the domain as a keyword you'd end up not matching the other domain. Instead you'd have to omit the token option, which would end up as the same result as the automatic approach. (Since they mention that those kind of strings should be ignored.)
(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)
(I wonder if we could copy the content blocking approach of compiling all these regular expressions into a finite state machine? That could be a way to make matching regular expression filters faster without worrying about keywords.)
- This would be an overkill
- In order to do it they have restricted regular expressions support to a very limited subset.
Take your filenuke example
Yes, bad example. Here is another one found in EasyList:
/\/[0-9].*\-.*\-[a-z0-9]{4}/$script,xmlhttprequest,domain=gaytube.com|keezmovies.com|spankwire.com|tube8.com|tube8.es|tube8.fr
Not sure if a token was available for this one -- whoever created the filter knows, but mainly my point is that token=
option, would be an easy low-tech way available immediately (easy implementation) to deal with this, with no need for a regex parser (which would fail anyway with the filter here). If no token is present for untokenizable filter, then we just end up with the current behavior.
Let's first think about what issue we are trying to solve.
First of all, domain-restricted filters are not a problem as there is no influence on the overall performance.
I suppose, that what we really need is to reduce the negative impact of the mistakes made by filters authors. For instance, the filters like /ajrotator/
and such. There is no problems with extracting a token from a rule like this.
Here is just a dirty example of a token extracting function:
var extractToken = function(ruleText) {
// Get the regexp text
var reText = ruleText.match(/\/(.*)\/(\$.*)?/)[1];
var specialCharacter = "...";
if (reText.indexOf('(?') >= 0 || reText.indexOf('(!?') >= 0) {
// Do not mess with complex expressions which use lookahead
return null;
}
// (Dirty) prepend specialCharacter for the following replace calls to work properly
reText = specialCharacter + reText;
// Strip all types of brackets
reText = reText.replace(/[^\\]\(.*[^\\]\)/, specialCharacter);
reText = reText.replace(/[^\\]\[.*[^\\]\]/, specialCharacter);
reText = reText.replace(/[^\\]\{.*[^\\]\}/, specialCharacter);
// Strip some special characters
reText = reText.replace(/[^\\]\\[a-zA-Z]/, specialCharacter);
// Split by special characters
var parts = reText.split(/[\\^$*+?.()|[\]{}]/);
var token = "";
var iParts = parts.length;
while (iParts--) {
var part = parts[iParts];
if (part.length > token.length) {
token = part;
}
}
return token;
};
I've tried this function with the rules above and here is the result: https://ameshkov.github.io/web/regex-tokens.html?1
What for the token
proposition, here are the downsides I see:
getadblock guys aren't invited to our party
They are using ABP's filtering engine since AdBlock v3.0. See https://github.com/kzar/watchadblock/releases/tag/3.0.
The other points still stand though:)
I wasn't aware of the many erroneous regex filters, looks like this can be easily addressed with a trivial code for these cases.
Mainly it was just to throw an idea out there, since these untokenizable filters have always bothered me[1], and I knew there was an issue like this opened on ABP issue tracker -- so I just threw the idea out there to have an easy fix, worth only if actually used by filter list maintainers.
Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.
[1] I was looking to even skip testing for domain hit -- but this is an implementation-dependent detail I suppose [2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.
[2] I understand that when a filter syntax is not supported by ABP, EasyList et al. maintainers won't use it.
By the way, I'd like to raise a question about the non-standard syntax.
You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has()
, :xpath()
, :matches-css
[1] and such.
The idea is really great and we will support some of these extended selectors as well (:has()
and :contains()
are currently in the beta testing stage, :matches-css()
is coming).
However, there is one issue that bothers me. The syntax you use (pseudo-classes syntax) is not backward-compatible and it will break good old stylesheet-based ad blockers like Adguard and ABP.
/* browser will ignore the whole style due to the second selector */
#banner, #banner:has(.test) { display: none; }
I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.
Backward compatible synonym for :has(...)
will be [-ext-has="..."]
Backward compatible synonym for :matches-css(...)
will be [-ext-matches-css="..."]
Backward compatible synonym for :xpath(...)
will be [-ext-xpath="..."]
[1] As I understand, there is a backward compatible :matches-css()
option already: https://issues.adblockplus.org/ticket/2390
You have recently added a couple of pseudo-classes extending element hiding rules syntax. I am talking about :has() ...
FWIW We are working towards adding the :has selectors too https://issues.adblockplus.org/ticket/3143
Anyway, I will just use this issue here to throw ideas once in a while which I think might be good for all blockers[2], especially when it comes to make the life of filter list maintainers easier.
:+1: Please do, I think collaboration benefits us all.
@kzar so, what do you think about the backward compatible syntax proposition?
@kzar regarding Lain's comment:
I think it's worth mentioning that :has() selector must work in combination with -abp-properties. So, filter like site.name##.block:has([-abp-properties="background: yellow"])
Using proposed syntax it could look like this:
##.block[-ext-has="*:matches-css(background: yellow)"]
@ameshkov Well I think the idea is that when browsers eventually support :has
selectors those filters will be again using standard CSS selectors anyway. We only need to implement special logic for those filters in the mean time as a stop-gap. I guess it's true (and unfortunate) that the syntax will break filters for ad blockers which haven't added support for now, but I guess that's not too bad since uBlock, AdGuard and Adblock Plus all plan to support them. (Also because they are only planned to be something used as a last resort.)
As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties
attribute.)
Well I think the idea is that when browsers eventually support :has selectors those filters will be again using standard CSS selectors anyway.
True. However, here is one more argument for that type of syntax. We all support a lot of different browsers (including mobile and such) and trying to use pseudo-classes
syntax requires us to do it simultaneously for all the platforms. While backward-compatible syntax allows us to roll this feature out gradually.
As for the general point of using backward compatible syntax like you've suggested, I think it's a good idea. (We already do something like that for CSS property filters using the -abp-properties attribute.)
Yeah, I know, that's why I was surprised by the implementation proposed in the issue 3143.
I suggest introducing a backward-compatible syntax along with the modern pseudo-classes-based one.
I will support the backward-compatible syntax where possible, but personally, internally I prefer using the :()
syntax. I see these new operators as nodes in a processing graph, and thus being able to easily and freely combine them I see this as a requirement for the future. Example[1]:
div.red:has(div.blue:matches-css(position: fixed;):contains(allo)):contains(publicité)
It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):
div.red[-ext-has="div.blue[-ext-matches-css=\"position: fixed;\"][-ext-contains=\"allo\"]"][-ext-contains="publicité"]
Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)')
would throw).
[1] Ok, the example is contrived, but it's just to illustrate easily combining such filters.
It does feel to me like a backward-compatible syntax would complicate writing such filters (especially the use of quotes):
Yeah, frankly, when I check something, I prefer to use the newer syntax as well.
However, it's not that bad, there's no need to support it inside of a composite filter.
Here, look at this example:
div.red[-ext-has="div.blue:matches-css(position: fixed):contains(allo):contains(publicité)"]
Aren't you validating element hiding filters at load time (or else using invalid CSS selector would break element hiding) so isn't true that old versions will discard filters with this new syntax? (Element:matches('div:has(span)') would throw).
Nope, in fact it was all of a sudden for us:) Also there's no way we could do it in desktop and mobile versions.
@gorhill one more thing regarding the :matches-css()
. I propose using a bit different syntax for it.
Could you please read this issue description and tell me what you think about it? https://github.com/AdguardTeam/ExtendedCss/issues/7
Q: Why additional pseudo-classes for matching before and after
I already support selector:after:style-properties(pattern)
, I just extract the :after
before using the selector at setup time. But I would not mind selector:style-properties-before(pattern)
-- it would just make the setup code a bit simpler.
Q: Why pattern-matching?
I agree with (optional) pattern matching. Pattern-matching is not something I implemented, but I don't see a problem supporting this. For the implementation side of such filter however, I would just want to be sure its semantic does not force a very specific implementation.[1]
I suppose that using this approach we could also cover existing abp-properties rules
Note that ABP's -abp-properties
has been implemented with a very different semantic in mind than something like :matches-css
: to reverse lookup CSS rules. Such filters shouldn't be used directly on a set of nodes for filtering purpose. The purpose of all the filters I have been adding lately are to reduce a set of nodes (starting with one as small as possible), so the suffix part is key, to start with the smallest set of nodes possible is key for performance.
For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px']
, given that it has no suffix selector, would have to be tested for all elements on a page, which would just kill performance.
[1] I see using cssText
as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText
string is generated on the fly by the browser when "getted"; b) using cssText
forces the use of a regex which will apply to a potentially large string.
I already support selector:after:style-properties(pattern)
It may look pretty good, but it bothers me that :after
in fact can't be part of a valid selector as pseudo-element cannot be selected. I suppose it could mislead a filter author.
[1] I see using cssText as a potentially high overhead approach, so I went with the dictionary approach, to test only for the enumerated properties. a) I suspect the cssText string is generated on the fly by the browser when "getted"; b) using cssText forces the use of a regex which will apply to a potentially large string.
Yep, I've run into a number of issues while implementing it. For now I've used a cross-browser function for extracting the cssText string: https://github.com/AdguardTeam/ExtendedCss/blob/feature/issues/7/lib/style-property-matcher.js#L96
Also I agree with you on the enumerated properties approach. There's no need in building the cssText
field, I will change the current implementation.
For example, a filter such as wetter.com##[-abp-properties='margin-left: 24px'],
Yeah, you're right. Also now when I know how this type of rules work, I find it a bit misleading. At least I think Lain_13 does not understand how it works.
@kzar what do you think about implementing something more "straightforward"?
I guess if we use the properties approach and agree on *-before/after postfix, there is no need for me to use another name for that pseudo class. matches-css
, matches-css-before
and matches-css-after
sounds good and describes the filter behaviour very well.
matches-css
,matches-css-before
andmatches-css-after
sounds good and describes the filter behaviour very well.
I agreed with this. This new selector, combinable with :has()
is going make filter list maintainers' life easier.
I've updated the syntax description: https://github.com/AdguardTeam/ExtendedCss/issues/7
Looking into this specific case this morning: https://github.com/uBlockOrigin/uAssets/issues/110.
This would be solvable without exception filters if it was possible to outright remove the targeted nodes from the DOM:
finanzen.net###bodyCenter > div[id]:has(:scope > #Ads_BA_Sky):remove()
The current implicit action to take on targeted nodes is to hide them. However, being to re-style has make the job of working against anti-blocker mechanisms much easier (AdGuard support this).
Additionally, being able to remove nodes from the DOM is something I have found would take care of many other cases as well (I do believe AdGuard support this in some ways, not sure). From my point of view, being forced to whitelist network requests from 3rd-party advertisers/trackers is always the worst option, and we should extend the capabilities of cosmetic filtering (element hiding) to avoid such whitelisting.
Oh, you have finally faced these german wunderwaffe-anti-adblock-solutions:) I was impressed when I saw this particular script for the first time.
Currently the easiest way to circumvent it is to inject a script like this:
Object.defineProperty(window, `UABPtracked`, { get: function() { return true; }, set: function() {} })
Regarding the DOM nodes removal thing, I need some time to think about it.
Currently the easiest way to circumvent it is to inject a script like this
I didn't realize they were using the uabp thing, I already had a scriptlet to take care of these -- it was not injected on that site.
Though in the long term, scriplets require more work and maintenance, and I would rather use generic cosmetic filter syntax where possible. In the current case, a node removal would work. It would also work for that case (edit: never mind, would not work for this case). Anyway, something to think about.
In the current case, a node removal would work
However, in this particular case node removal is not the best solution. This anti-adblock script is pretty ugly, it sets up a timer and redraws ads every 5 or so seconds. And with nodes removed it continues to do something with DOM.
Talking about anti-adblock scripts, I really do not see a good declarative solution which does not involve scripting.
Let's start with analysis. Most of the things we discuss are directly caused by the websites trying to circumvent ad blocking.
Basically, there are two approaches:
Point 1 can be solved by the new pseudo-elements (at least for now). Point 2 can be solved by scripts (like reek's AAK for instance).
Btw, reek is the best anti-adblock scripts expert I know, let's ask his opinion.
@gorhill @ameshkov We are discussing WebSocket circumvention on the Adblock Plus issue tracker, but unfortunately we've had to make the issue confidential. (Guess why...) Anyway I'd like to copy you both in on the issue, as mapx pointed out it would be good to get your feedback there too.
Are you guys signed up on our issue tracker? If so what are your usernames?
@gorhill Also a possibly dumb question, doesn't a Content Security Policy like connect-src http:; frame-src http:
also prevent https connections?
doesn't a Content Security Policy like
connect-src http:; frame-src http:
also prevent https connections?
Not according to spec:
The URL matching algorithm now treats insecure schemes and ports as matching their secure variants. That is, the source expression
http://example.com:80
will match bothhttp://example.com:80
andhttps://example.com:443
.
Guess why...
They will see it anyway:)
Are you guys signed up on our issue tracker? If so what are your usernames?
Just signed up, username is ameshkov
@gorhill, @ameshkov Heads up, we're going to consider WebSocket requests as the type "websocket" instead of "other" in the future. More details in this blog post: https://adblockplus.org/development-builds/new-filter-type-option-for-websockets
@kzar hey Dave, thanks for the heads up.
@gorhill @kzar Btw, have you already seen the bleeding edge technology: loading ads code through RTCPeerConnection?
have you already seen the bleeding edge technology
Yes, first time I saw it on Merriam-Webster's site.
Any idea besides wrapping RTCPeerConnection?
So far, no -- aside giving users the option of disabling entirely WebRTC.
@ameshkov No, I did not realise people already started abusing WebRTC. Man. :-1:
Do you guys have an URL for an example of a website using WebRTC for circumvention that I can take a look at?
Actually would you mind removing that comment here?
Done;)
So far, no -- aside giving users the option of disabling entirely WebRTC.
Does it really work in Chrome? I thought it is a bit limited.
Do you guys have an URL for an example of a website using WebRTC for circumvention that I can take a look at?
Code example: https://forum.adguard.com/index.php?threads/block-rtcpeerconnection.13808/#post-102128
I'd rather discuss our WebSocket plans in the issue on our tracker, since it's marked confidential
I understand not discussing ideas of workarounds for our own blocking solutions, but here I don't see the point, the websocket issue came about because it's already used out there.
@ameshkov Thanks!
@gorhill There's a new issue I'd like to involve you with but can't unless you have a user on our issue tracker. Mind creating one?
[Intentionally empty]