Closed aeris closed 4 years ago
Do not post any filter list issues or issues where website's functionality is broken. We have uAssets issue tracker for that, post there instead.
It's a technique used to bypass filters/rules, it's something which needs to be investigated.
Dupe/related discussion: https://github.com/uBlockOrigin/uAssets/issues/6538
Aren't they lying to PSL with these first-party domain entries ?
Edit: It's an inline-script, should be able to defuse via a scriptlet.
liberation.fr##+js(aopw, EA_data)
works.
Here's a crude dump of sites using Eulerian Analytics inline-script -- https://publicwww.com/websites/EA_data/
@uBlock-user that scriptlet will only work for sites inserting the script using that variable. For other sites like oui.sncf
, use this: https://github.com/uBlockOrigin/uAssets/issues/6538#issuecomment-552202850
Websites I tested so far are using that variable, except for the one you mentioned. oui.sncf
redirects me to https://en.oui.sncf/fr/?redirect=yes
where parseInt.+?3600000
is not found in the inline-script.
As per view-source:https://en.oui.sncf/fr/?redirect=yes
, this is the js --
<script>
(function(d, s, id) {
if (d.getElementById(id)) return;
var js = d.createElement(s),
fjs = d.getElementsByTagName(s)[0],
vscaUrl = "//wblt.oui.sncf";
js.id = id;
js.async = true;
js.src = vscaUrl + "/prod/" +
(vsca_pageTag.config.vsca_version ? vsca_pageTag.config.vsca_version + "/" : "") +
vsca_pageTag.config.siteId +
"/vsca.js?M2lU3mD1O47ZAzgnp0wX";
fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'vscascript'));
</script>
Filter -- oui.sncf##+js(acis, document.getElementById, vscaUrl)
I'm in US, it did not redirect me. The inline script on oui.sncf
is
<!--begineulerian-->
<script type="text/javascript">
(function(){var d=document,l=d.location;if(!l.protocol.indexOf('http')){var o=d.createElement('script'),a=d.getElementsByTagName('script')[0],cn=parseInt((new Date()).getTime()/3600000);o.type='text/javascript';o.async='async';o.defer='defer';
o.src='//v.oui.sncf/content/vsc-fr/8lL.QlYVeQ7BL6AqQORYg_FeHeIQMaObMRxsXxGG0g--/'+cn+'.js';
a.parentNode.insertBefore(o,a);}})();
</script>
<!--endeulerian-->
And the inline script in https://github.com/uBlockOrigin/uBlock-issues/issues/780#issuecomment-552206887 is not Eulerian, it is another tracker, not the one @aeris is talking about. Another site: officedepot.fr
. Add officedepot.fr##+js(acis, document.createElement, parseInt)
Probably because of difference in geo-location of ourselves, we're not being served the same script. It may not be Eulerian but it's in the same vein as that.
Another site: officedepot.fr
That one definitely EA -- https://myip.ms/info/whois/109.232.195.156/k/3227454398/website/ea.officedepot.fr
New detection : keyade.com, on rueducommerce.fr omtrdc.net, on sfr.fr
Offtopic:
Weird thing: it seems a pattern is the scripts ending with 7825
. So here's a regex you can add to your filters ... (note-i'm not a regex expert obviously)
/(\.\w+)[.]?\/[A-z]{7}(7825)\.js$/
Example scripts:
https://f7ds.liberation.fr/aaAAaaA7825.js
https://v.oui.sncf/SNCFVOU7825.js
https://ea.officedepot.fr/potfrWW7825.js
Test sites: https://www.maeva.com
and https://www.brandalley.fr/
Also another PublicWWW search: https://publicwww.com/websites/%22parseInt%28%28new+Date%28%29%29.getTime%28%29%2F3600000%29%22/
Wondering if https://github.com/uBlockOrigin/uBlock-issues/issues/44 can will apply here if implemented.
Can't apply, the case given as example make use of legitimate subdomains, statics.liberation.fr
, medias.liberation.fr
.
I am looking at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns/resolve, it can be used to expose the CNAME:
browser.dns.resolve('f7ds.liberation.fr', [ "canonical_name" ]).then(r => { console.log(r); });
Promise { <state>: "pending" }
Object { addresses: (1) […], canonicalName: "atc.eulerian.net", isTRR: false }
I will prototype and evaluate how to optimally use this in uBO with the utmost care.
Will this be applied in uMatrix too ?
Yes.
You will need to add a new permission named 'dns' in the manifest to use this API - https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns and since this is Firefox only API, how will you address this in Chromium ?
I am looking at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/dns/resolve, it can be used to expose the CNAME:
Time to think about the future too. This detection can easily be bypassed with CNAME removal and a direct A/AAAA. Perhaps time to include IP range blacklist or AS number detection ? :thinking: For Eulerian, IP (109.232.197.0/24) and ASN (AS50234) are dedicated, so no false positive or negative, but may be more complicated in case of mutualised ones…
how will you address this in Chromium ?
uBO already make use of Firefox-specific API, for example, filterResponseData()
.
I meant how will you fix this in Chromium..
Best to assume it can't be fixed on Chromium if it does not support the proper API.
the case given as example make use of legitimate subdomains
In case by case basis, regex with whitelist-approach assertion can be used:
/^https:\/\/(?!www|images|medias|statics)/$script,1p,domain=liberation.fr
Time to think about the future too. This detection can easily be bypassed with CNAME removal and a direct A/AAAA. Perhaps time to include IP range blacklist or AS number detection ?
@aeris I assume that would mean bundling a list of ranges to block, some of which generated from a list of known AS. There is no API in Firefox to resolve the IP ranges of an AS, is there?
I reckon we could generate a list using RIPE's api (based on this data), with for instance: https://stat.ripe.net/data/routing-history/data.json?resource=AS50234
or a JS client for it (doc).
In case by case basis, regex with whitelist-approach assertion can be used:
A csp=
directive is preferable to a regex:
||liberation.fr^$csp=script-src www.liberation.fr images.liberation.fr medias.liberation.fr statics.liberation.fr
(plus whatever else is needed of course).
@rigelk Will be difficult, yep :joy: Even obtain the AS from an IP or domain is tricky, and is full time study for Tor (see this)
A
csp=
directive is preferable to a regex:
Yes, I thought about it, but page may include unlimited number of external resources, it will be hard to not block them accidentally.
CSP will be the preferable solution for Chromium users.
New detection : Xiti now does 1st-party tracking lemonde.fr → buf.lemonde.fr → buf-lemonde-fr-cddc.at-o.net
echo | openssl s_client -connect buf-lemonde-fr-cddc.at-o.net:443 |& rg depth=0
depth=0 C = FR, L = MERIGNAC, O = AT Internet, OU = Service Technique, CN = *.ati-host.net
AT Internet = Xiti
Same on client.boursorama.com → c0011.boursorama.com → c0011-boursorama-com-cddc.at-o.net
Image injection this time, no javascript involved…
La FNAC, 3 1st-parties :
New and tricky case, with more difficulties to detect or block.
20minutes.fr
includes contents from 20mn.fr
which seems to be their CDN domain.
Content (surely JS) from this CDN domain loads back content on primary 20minutes.fr
domain, with a a.20minutes.fr
, wich is a-20minutes-fr-cddc.at-o.net
and so Xiti.
More interesting, they also have a.20min.fr
, pointing to ads.20min.maxcdn-edge.com
, which is not currently in production, but this case is trickier to handle because the final domain is not a dedicated one. We need regexp ^ads
exclusion on this case.
fyi Eulerian gently provides your test suite in their Privacy page:
Appendix: list of sites on wich our clients use our software solutions
AT/Xiti also, but a bit of scripting is required.
Maybe these lists can be used to generate a blacklist fo subdomains?
I try to develop a think to check for a domain if there is eulerian subdomain. You can't generate 1st-level domain blacklist just with top-level domain :sob: You have to really crawl the page, execute the JS, and listen on a dummy DNS resolver to catch the tracker. POC on the road.
You can't generate 1st-level domain blacklist just with top-level domain sob You have to really crawl the page, execute the JS, and listen on a dummy DNS resolver to catch the tracker. POC on the road.
With headless browser like PhantomJS it's possible to execute JS of a website in a script.
Also, Confess is a PhantomJS script that can be used to headlessly analyze web pages.
I would prefer to keep the issue here as focused as possible: to deal with CNAME
-ed hostnames. For investigation work about list of hostnames being CNAME'd or other "evasion" mechanisms, this is best done elsewhere -- though you can link to that elsewhere here if useful for the current issue. At this point whoever subscribed to this issue is being notified non-stop about every single new comment being made.
If you want to bring forth a new evasion mechanism, please open a new issue about it.
Maybe a reverse lookup could be done. Once we have the final IP, check what DNS entry is linked to it. Or maybe add a feature based on the community, where people can add manually an entry that is shared to other member. And for each entry we can, like on Waze, add a "I validate it" buton or something to prevent false URL or to cleanup URL that doesn't exist anymore. But for this last idea you need a server to broadcast all info...
If using 1.24.1b0 and above, to "uncloak" actual (canonical, CNAME) hostname, set advanced setting cnameAliasList
to *
.
Network requests for which the actual hostname differs from the original hostname will be replayed through uBO's filtering engine using the actual hostname. When I started developing the feature I could spot eulerian.net
in the logger when visiting https://www.liberation.fr/
, but I can no longer reproduce this. Regardless, uBO is now equipped to deal with 3rd-party disguised as 1st-party as far as Firefox's browser.dns
allows it.
The next step is for me to pick a cogent way for filter list maintainers to be able to tell uBO to uncloak specific hostnames, as doing this by default for all hostnames is not a good idea -- as this could cause a huge amount of network requests to be evaluated twice with no benefit for basic users (default settings/lists) while having to incur a pointless overhead -- for example when it concerned CDNs which are often aliased to the site using them.
Access IP address and hostname information
That's the new permission title when first updated to this build or any future stable builds with DNS WebExt. API for anyone wondering what this is.
but I can no longer reproduce this.
Disabling liberation.fr##+js(acis, document.createElement, '.js')
found in uBO-Privacy makes reproduction possible again.
Best to assume it can't be fixed on Chromium if it does not support the proper API.
Can't this be "emulated" in Chromium by resolving the hostnames using DNS over HTTPS in JSON format (https://developers.cloudflare.com/1.1.1.1/dns-over-https/json-format/)?
For example, I can use Cloudflare's DNS with curl -H 'accept: application/dns-json' 'https://cloudflare-dns.com/dns-query?name=f7ds.liberation.fr&type=CNAME'
and get
{"Status": 0,"TC": false,"RD": true, "RA": true, "AD": false,"CD": false,"Question":[{"name": "f7ds.liberation.fr.", "type": 5}],"Answer":[{"name": "f7ds.liberation.fr.", "type": 5, "TTL": 2633, "data": "liberation.eulerian.net."}]}
Which obviously contains the tracking hostname.
There's an obvious issue with using Cloudflare for this (although Firefox does by default after you enable DoH, so probably it's not such a privacy disaster). There's at least one DoH resolver that supports the same JSON API and claims to respect user privacy, https://blahdns.com (I am not in any way affiliated with them).
To speed things up, maybe it's possible for uBlock to maintain its own cache of hostnames and re-resovle only once in a while.
Can't this be "emulated" in Chromium by resolving the hostnames using DNS over HTTPS in JSON format?
Interesting case of first-party NS alias scheme. I discovered and studied a similar approach by OpenX. Perhaps you'll find it of use?
Back then I suggested this rule:
The default filter list provide rules enabling the blocking of those requests. For example, the rule ox-d.*^auid= matches against requests to http://ox-d.example.com/auid=.... This would effectively block all requests to these domains.
But indeed if the domain name part is random this gets complicated. Good luck on solving it!
Isn't that the technique? https://lucb1e.com/rp/cookielesscookies/
Also, another avenue to check is not just a canonical name lookup, but also the AS number, it won't catch cloud hosted solutions, but for service providers that use their own networks to host tracking servers then this might add another / different data point. Any reasonable whois JSON service will return the owner / AS number.
Jumping in here to say that machine learning might do it. A closer look at in-depth data is required of course. I've personally built character-based CNN to block URLs in the past, not difficult. Avoiding false positives is also possible, at the expense of letting more unwanted traffic through. Collecting returns on false positives would allow improving the models. Anyone interested ok n this can ping me, I have GPUs available, and other resources.
The next step is for me to pick a cogent way for filter list maintainers to be able to tell uBO to uncloak specific hostnames, as doing this by default for all hostnames is not a good idea -- as this could cause a huge amount of network requests to be evaluated twice with no benefit for basic users (default settings/lists) while having to incur a pointless overhead -- for example when it concerned CDNs which are often aliased to the site using them.
FF's dns.resolve()
at least seems to cache, it remains to be checked wether passing canonical_name
will incur a second request, or the cached information from the first request is enough. And then there might be a possible difference between Mozilla's TRR and the system's resolver.
Does anyone know of a service that could be used to look up CNAMEs that point to specified hostnames?
At least for my list, I could create a script that does a reverse-CNAME-lookup for entries. I've applied to use Farsight's DNSDB, we'll see if they let me in.
No service could be 100% accurate, but it might help.
Does anyone know of a service that could be used to look up CNAMEs that point to specified hostnames?
DNS over HTTPS ? For example, cloudflare
Invoke-RestMethod -Headers @{"Accept" = "application/dns-json"} "https://cloudflare-dns.com/dns-query?name=f7ds.liberation.fr&type=CNAME" | ConvertTo-Json
{
"Status": 0,
"TC": false,
"RD": true,
"RA": true,
"AD": false,
"CD": false,
"Question": [
{
"name": "f7ds.liberation.fr.",
"type": 5
}
],
"Answer": [
{
"name": "f7ds.liberation.fr.",
"type": 5,
"TTL": 3538,
"data": "liberation.eulerian.net."
}
]
}
@janis-veinbergs I'm afraid that's looking up which hostname a CNAME record points to. You can do this with standard DNS lookups.
I'm looking for a service that, given a hostname, will show which CNAMEs point to it.
@pgl maybe this service? https://mxtoolbox.com/CNAMELookup.aspx
@cmoro-deusto This doesn't allow me to find which CNAMEs point to a particular hostname.
Helle here!
Since friday, we hit a case of 1st-party tracking that seems to be unblockable.
This occurs on
https://www.liberation.fr/
, embedding a 1st-party trackerf7ds.liberation.fr
, which point to a ugly tracking provider Eulerian via the CNAMEliberation.eulerian.net
.This provider clearly states it provide unblockable tracker
Seems Criteo starts to ask the same to their customer, with 1st-party tracking pointing to
*.dnsdelegation.io
subdomain.In this case, it seems really difficult to block such tracker by tools like uBlock:
f7ds.example.org
), even if we found someea.*
pattern*.eulerian.net
or*.dnsdelegation.io
), but this is difficult to integrate to browser (those steps are internal to DNS client resolver)Do you have any way to detect then block such content from the browser? The only (not so) efficient way I have at the moment is using DNS tools like PiHole to blacklist range of IP and CNAME pattern resolution. And even this way, it doesn't cover all the possible case… Even tools like µMatrix seems totally inefficient on such tracker…