uBlockOrigin / uBlock-issues

This is the community-maintained issue tracker for uBlock Origin
https://github.com/gorhill/uBlock
944 stars 80 forks source link

Add wildcard support for TLD in $domain filters #1008

Closed ediowar closed 3 years ago

ediowar commented 4 years ago

Prerequisites

Description

[Description of the bug or feature] AdGuarg rule example ||block.domain^$all,domain=google.*

A specific URL where the issue occurs

[A specific URL is MANDATORY for issue happening on a web page, even if it happens "everywhere"]

Steps to Reproduce

  1. [First Step]
  2. [Second Step]
  3. [and so on...]

Expected behavior:

[What you expected to happen]

Actual behavior:

[What actually happened]

Your environment

gorhill commented 4 years ago

AdGuarg rule example

Which AdGuard list contains this filter?

Alex-302 commented 4 years ago

That is just example, we do not have this rule)

gorhill commented 4 years ago

So I am being asked to support something for which there is no current use case?

I always need a use case, actually many use cases when it involves adding complexity to the filtering engine.

gwarser commented 4 years ago

AdGuard has /high-speed-download.png$domain=extramovies.*.

AdGuard doc: https://kb.adguard.com/en/general/how-to-create-your-own-ad-filters#wildcard-for-tld

Also tons of examples in this guy filters https://github.com/kano1/I but I'm not sure he know what he is doing.

krystian3w commented 4 years ago

https://github.com/DandelionSprout/adfilt/issues/63#issuecomment-623098824

extramovies.*

[rdk@on filterlists.com_resources]$ grep -r '$.*domain=.*\*'
166_AdGuard Base Filter (AdGuard for Chromium).txt:/high-speed-download.png$domain=extramovies.*
166_AdGuard Base Filter.txt:/high-speed-download.png$domain=extramovies.*
1568_AdGuard Base Filter Optimized.txt:/high-speed-download.png$domain=extramovies.*
1528_AdGuard Base Filter without EasyList.txt:/high-speed-download.png$domain=extramovies.*
2210_AdGuard Base Filter (uBlock Origin).txt:/high-speed-download.png$domain=extramovies.*
1568_AdGuard Base Filter (Optimized).txt:/high-speed-download.png$domain=extramovies.*
2214_AdGuard Base Filter without EasyList (uBlock Origin).txt:/high-speed-download.png$domain=extramovies.*

mail.google.*,gmail.*

2061_Cybo's Simplified Domains.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*
1836_Cybo's Hosts.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*
2060_Cybo's Hosts - Extra Format.txt:@@||ssl.gstatic.com/ui/v1/icons/mail/images/cleardot.gif$domain=mail.google.*,gmail.*

kdw*.com Invalid in uBO / AG?

2104_ADgk Mobile Advertising Rules - adgk.txt:||cdn-img.tadpoles.xyz/vipgg/pc^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx2.sinaimg.cn/mw1024^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx*.sinaimg.cn/large^$domain=kdw*.com
2104_ADgk Mobile Advertising Rules - adgk.txt:||wx*.sinaimg.cn/large^$domain=kdw*.com,important
[rdk@on filterlists.com_resources]$
DandelionSprout commented 4 years ago

https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/issues/123 claims that ABP also supports wildcards in $domain as of 3 months ago, which took me such by surprise that I didn't even think it'd be a possibility until 2 hours ago.

That being said, it doesn't seem to work in ABP 3.8.4 to the degree I've been able to test it in the span of 5 minutes.

DandelionSprout commented 4 years ago

As for an actual current use example, I'd have loved to be able to distill e.g. ||ssl.p.jwpcdn.com^*/sharing.js$important,script,domain=eurosport.no|eurosport.dk|gamereactor.no|gamereactor.dk into ||ssl.p.jwpcdn.com^*/sharing.js$important,script,domain=eurosport.*|gamereactor.* in the regular version of my Nordic list as well (and not just in the AdGuard version), alongside ~15 other similar entries.

uBlock-user commented 4 years ago

ABP also supports wildcards in $domain as of 3 months ago, which took me such by surprise that I didn't even think it'd be a possibility until 2 hours ago.

Neither did I, I asked about this years ago, but gorhill told that ABP syntax doesn't support this, so it never went ahead after that,

gorhill commented 4 years ago

It's not supported by ABP, what they fixed is to reject those filters when encountered.

peace2000 commented 4 years ago

Many sites have subdomains. That's why wildcard would help when making filters.

krystian3w commented 4 years ago

Many sites have subdomains.

Subdomains have own issue: https://github.com/uBlockOrigin/uBlock-issues/issues/957 (*.abc.com when homepage don't use www.)

peace2000 commented 4 years ago

Sorry, wrong term. :) Meant tld's.

gorhill commented 4 years ago

Related: https://github.com/gorhill/uBlock/issues/2133.

gorhill commented 4 years ago

The only reason that I didn't declined this request outright is because of that one small part:

AdGuarg rule example ||block.domain^$all,domain=google.*

Now it turned out the provided filter did not really exist, I couldn't find it. I invested my own time to try to figure in which AdGuard filter list is located the mentioned filter. Other people invested their own time to finally find what @ediowar should have been taking the time to minimally detail as the requester: that there is a single instance of such filter in AdGuard list, specifically /high-speed-download.png$domain=extramovies.*.

And furthermore, there is a need to detail what this filter solves. Is what it solve already solved in another way in uBO? If so what is the real benefit of asking someone else to spend time and effort implementing this in uBO if there is no current benefit to a majority of end users?

So from now on, here is how this will work for request to add filtering feature to static filtering:

If you are not a filter list maintainer with a reasonable enough track record of maintaining good quality filter list(s) in broad use, or if you are not making a convincing case that a specific static filtering feature is of benefit to a majority of end users, the issue will be declined without further comment -- i.e. issues equivalent to "do this kthxbye" are not accepted.

I call these issues drive-by feature requests, i.e. non-long time contributors with little to no time invested to make the case of why someone else than them should invest more time and efforts in adding code and complexity, also taking into account future maintenance work of that code and complexity.

The people who have my utmost attention when it comes to adding static filtering feature are those who actually have a track record of maintaining filter lists used by uBO -- they are the one who I want to help make their life easier whenever I can do so, as technically feasible as possible. As an example, the sole reason of why I agreed to add the cname option recently is to make the work of filter list maintainers easier (denyallow= was not requested but I added it for the same exact reason, to avoid the tediousness of having to craft that sort of filters).

The other people who have my attention are those who care enough to spend their own time making a convincing case of why a specific static filtering feature request is of benefit to a majority of users. Here is an example of such issue.

Now regarding the specific issue of supporting entity syntax in domain= option: @mapx-, @okiehsch, @ryanbr, and all other usual contributors to filter lists, how useful would such feature be?

Is this a must-have feature that you would start using regularly immediately or is it something that won't make a big difference to the workload in the big picture whether it's supported or not? (or any stance in between.)

mapx- commented 4 years ago

It would be useful in cases like this:

://192.168.*/images/$important,domain=pornhub.com|pornhub.net|pornhub.org|pornhubthbh7ap3u.onion|xtube.com
!pornhub.*,pornhubthbh7ap3u.onion,xtube.com##+js(aopw, AdDelivery)

to have the same thing to use in both places

Even more, there are lot of sites often changing their TLD (but keeping the same js code / tricks) and the domain=example.com filters become obsolete: ||googlesyndication.com/pagead/js/adsbygoogle.js$script,redirect=noopjs,domain=vev.io|vev.red requesting our intervention to adjust the filter (adding / removing other TLD)

DandelionSprout commented 4 years ago

If it helps on the matter, I've now asked Andrey Meshkov and his pals on Slack about whether they plan to use $domain wildcards in e.g. AdGuard Base on a much larger scale anytime soon.

If they were to say yes to my inquiry, it could lead to e.g. @@http://adsense.google.$document,domain=google.ad|google.ae|google.al|google.am|google.as|google.at|google.az|google.ba|google.be|google.bf|google.bg|google.bi|google.bj|google.bs|google.bt|google.by|google.ca|google.cat|google.cd|google.cf|google.cg|google.ch|google.ci|google.cl|google.cm|google.cn|google.co.ao|google.co.bw|google.co.ck|google.co.cr|google.co.id|google.co.il|google.co.in|google.co.jp|google.co.ke|google.co.kr|google.co.ls|google.co.ma|google.co.mz|google.co.nz|google.co.th|google.co.tz|google.co.ug|google.co.uk|google.co.uz|google.co.ve|google.co.vi|google.co.za|google.co.zm|google.co.zw|google.com|google.com.af|google.com.ag|google.com.ai|google.com.ar|google.com.au|google.com.bd|google.com.bh|google.com.bn|google.com.bo|google.com.br|google.com.bz|google.com.co|google.com.cu|google.com.cy|google.com.do|google.com.ec|google.com.eg|google.com.et|google.com.fj|google.com.gh|google.com.gi|google.com.gt|google.com.hk|google.com.jm|google.com.kh|google.com.kw|google.com.lb|google.com.ly|google.com.mm|google.com.mt|google.com.mx|google.com.my|google.com.na|google.com.nf|google.com.ng|google.com.ni|google.com.np|google.com.om|google.com.pa|google.com.pe|google.com.pg|google.com.ph|google.com.pk|google.com.pr|google.com.py|google.com.qa|google.com.sa|google.com.sb|google.com.sg|google.com.sl|google.com.sv|google.com.tj|google.com.tr|google.com.tw|google.com.ua|google.com.uy|google.com.vc|google.com.vn|google.cv|google.cz|google.de|google.dj|google.dk|google.dm|google.dz|google.ee|google.es|google.fi|google.fm|google.fr|google.ga|google.ge|google.gg|google.gl|google.gm|google.gp|google.gr|google.gy|google.hn|google.hr|google.ht|google.hu|google.ie|google.im|google.iq|google.is|google.it|google.je|google.jo|google.kg|google.ki|google.kz|google.la|google.li|google.lk|google.lt|google.lu|google.lv|google.md|google.me|google.mg|google.mk|google.ml|google.mn|google.ms|google.mu|google.mv|google.mw|google.ne|google.nl|google.no|google.nr|google.nu|google.pl|google.pn|google.ps|google.pt|google.ro|google.rs|google.ru|google.rw|google.sc|google.se|google.sh|google.si|google.sk|google.sm|google.sn|google.so|google.sr|google.st|google.td|google.tg|google.tk|google.tl|google.tm|google.tn|google.to|google.tt|google.vg|google.vu|google.ws being changed into @@http://adsense.google.$document,domain=google.* on short notice, alongside ~40 similar entries for Google, Amazon, Eurogamer, and other sites, and maybe most of all for Yandex in AdGuard Russian.

gorhill commented 4 years ago

I've now asked Andrey Meshkov and his pals on Slack

We can cc him in case he wants to answer here directly: cc @ameshkov

ameshkov commented 4 years ago

Hey everyone, yeah, we're going to, but later, when it's properly supported by all AG versions.

edit: which will happen in a couple of months from now. I wish I could be more precise:(

okiehsch commented 4 years ago

Is this a must-have feature that you would start using regularly immediately or is it something that won't make a big difference to the workload in the big picture whether it's supported or not?

It is a feature that would be useful in the cases that mapx- described and I would use it if available, I don't think it would make a big difference to the workload for uAssets. For EasyList the difference would be bigger if they start using it which I doubt as long as AdblockPlus does not support it.

gorhill commented 4 years ago

as long as AdblockPlus does not support it.

From their discussion thread, it does not look like they want to support this. It seems their key argument is worries about false positives but I consider this a secondary argument regarding whether to support the option or not -- the same could be said of many other currently existing filtering options, what matters in the end is that filter list maintainers should be trusted to make the right calls when it comes to use whatever filtering options is at their disposal.

For me the primary arguments is whether this will be used often enough and whether it makes the task of maintaining filter lists easier. So given the comments above, I decided I will support the syntax -- I don't see any issue to implement this code-wise.

kulfoon commented 4 years ago

DandelionSprout : https://github.com/uBlockOrigin/uBlock-issues/issues/1008#issuecomment-623185325: As for an actual current use example, I'd have loved to be able to distill e.g. (...) in the regular version of my Nordic list as well (and not just in the AdGuard version), alongside ~15 other similar entries.

If already talking about distilling:

The longest:

From uBlock Unbreak L3441-L3442 (44) x 2 = 88:

@@||static.ziffdavis.com/sitenotice/evidon-barrier.js$script,domain=allestoringen.be|allestoringen.nl|xn--allestrungen-9ib.at|xn--allestrungen-9ib.ch|xn--allestrungen-9ib.de|downdetector.ae|downdetector.ca|downdetector.c|downdetector.co.nz|downdetector.co.uk|downdetector.co.za|downdetector.com.ar|downdetector.com.au|downdetector.com.br|downdetector.com.co|downdetector.com|downdetector.cz|downdetector.dk|downdetector.ec|downdetector.es|downdetector.fi|downdetector.fr|downdetector.gr|downdetector.hk|downdetector.hr|downdetector.hu|downdetector.id|downdetector.ie|downdetector.in|downdetector.it|downdetector.jp|downdetector.mx|downdetector.my|downdetector.no|downdetector.pe|downdetector.pk|downdetector.pl|downdetector.pt|downdetector.ro|downdetector.ru|downdetector.se|downdetector.sg|downdetector.sk|downdetector.web.tr @@||static.ziffdavis.com/sitenotice/*/translations/$script,domain=allestoringen.be|allestoringen.nl|xn--allestrungen-9ib.at|xn--allestrungen-9ib.ch|xn--allestrungen-9ib.de|downdetector.ae|downdetector.ca|downdetector.c|downdetector.co.nz|downdetector.co.uk|downdetector.co.za|downdetector.com.ar|downdetector.com.au|downdetector.com.br|downdetector.com.co|downdetector.com|downdetector.cz|downdetector.dk|downdetector.ec|downdetector.es|downdetector.fi|downdetector.fr|downdetector.gr|downdetector.hk|downdetector.hr|downdetector.hu|downdetector.id|downdetector.ie|downdetector.in|downdetector.it|downdetector.jp|downdetector.mx|downdetector.my|downdetector.no|downdetector.pe|downdetector.pk|downdetector.pl|downdetector.pt|downdetector.ro|downdetector.ru|downdetector.se|downdetector.sg|downdetector.sk|downdetector.web.tr

From uBlock filters L8679-L8680 (9) x 2 = 18: @@||imasdk.googleapis.com/js/sdkloader/ima3.js$script,domain=esgentside.com|exclusivomen.com|gentside.com|gentside.co.uk|gentside.de|gentside.it|maxisciences.com|ohmirevista.com|ohmymag.co.uk|ohmymag.com|ohmymag.de|ohmymag.it *$script,redirect-rule=noopjs,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.com.br|ohmymag.de|ohmymag.co.uk,3p

From uBlock filters L8684-L8685 (8) x 2 = 16: @@||googletagservices.com/tag/js/gpt.js$script,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.de|ohmymag.co.uk @@*/assets/prebid/$script,xhr,1p,domain=esgentside.com|gentside.com|gentside.it|gentside.com|gentside.de|gentside.co.uk|gentside.com.br|maxisciences.com|ohmirevista.com|ohmymag.com|ohmymag.com.br|ohmymag.de|ohmymag.co.uk

From uBlock Unbreak L2148-L2149 (6) x 2 = 12: @@||adobedtm.com/*/satelliteLib$script,domain=fcbarcelona.cat|fcbarcelona.cn|fcbarcelona.com|fcbarcelona.es|fcbarcelona.fr|fcbarcelona.jp @@||adobedtm.com/*/mbox-contents-$script,domain=fcbarcelona.cat|fcbarcelona.cn|fcbarcelona.com|fcbarcelona.es|fcbarcelona.fr|fcbarcelona.jp

From uBlock filters L8447 (12): ||booking.com^$popunder,domain=viamichelin.at|viamichelin.be|viamichelin.ch|viamichelin.co.uk|viamichelin.com|viamichelin.de|viamichelin.es|viamichelin.fr|viamichelin.it|viamichelin.nl|viamichelin.pl|viamichelin.pt

From uBlock Annoyance L788 (9): @@||imasdk.googleapis.com/js/sdkloader/ima3.js$script,domain=gamereactor.asia|gamereactor.de|gamereactor.es|gamereactor.eu|gamereactor.fi|gamereactor.it|gamereactor.nl|gamereactor.no|gamereactor.pt

Apart from talking about distilling, I also provide a full list of all uBlock filters containing at least 2 domains which differ only by TLD, so you can check whether they will or not, benefit from implementing the wildcard feature:

The Full List:

uBlock filters: L193-L196 (2) x 4 = 8 L355 (3) (mapx-'s) L1075 (2) L1080 (2) L1190 (2) (mapx-'s) L1911-L1912 (2) x 2 = 4 L1937 (2) L3140 (2) L3228 (3) L5906 (2) L6358 (3) L8447 (12) L8679-L8680 (8) x 2 = 16 L8684-L8685 (9) x 2 = 18 L11339 (2) L15780-L15781 (3) x 2 = 6 L17041 (3) (similiar to mapx-'s L355) L19586 (2) L20458 (2)

uBlock Unbreak: L352 (2) L448 (3) L2111-L2112 (2) x 2 = 4 L2148-L2149 (6) x 2 = 12 L3176 (3) L3441-L3442 (44) x 2 = 88

uBlock Resource Abuse: L124 (4) L165 (4) L234 (4)

uBlock Annoyance: L788 (9) L2873 (2)

uBlock Privacy: L83 (2)

Ok, I spent already 5 hours to collect and format the data, enought as for now

LennyFox commented 4 years ago

I would like to make a plea for wildcard entity-like wildcard support for TLD's in domain.

I use it in Adguard User Filters and it is really handy to cover all variants of country specific websites.

peace2000 commented 4 years ago

Seems that ABP is going to add wildcards as well now: https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/issues/123#note_339550064

DandelionSprout commented 4 years ago

https://gitlab.com/eyeo/adblockplus/adblockpluscore/-/merge_requests/334 seems to imply it is indeed underway, even if the technical details behind it elude me.

uBlock-user commented 4 years ago

Entity support wasn't added for redirect/redirect-rule directives, so re-opening.

asheroto commented 2 years ago

+1 vote