uBlockOrigin / uBlock-issues

This is the community-maintained issue tracker for uBlock Origin
https://github.com/gorhill/uBlock
926 stars 77 forks source link

Prevent uBO from hiding html or body when matched by a generic cosmetic filter #1692

Closed peace2000 closed 3 weeks ago

peace2000 commented 3 years ago

Prerequisites

I tried to reproduce the issue when...

Description

There have been numerous occasions where Annoyance lists have blanked out websites because either html or body have contained values that have been matched by a generic cosmetic filter. This is a pretty common issue for Annoyance lists, though there have been a few occasions in normal adblocking as well.

Usually these issues have been fixed by adding :not(html) or :not(body) exception to the problematic generic filter.

I was wondering if it would be reasonable to add a safeguard measure to uBO: to prevent it from applying cosmetic filters, that have a match in html or body in websites where an user is visiting. I think that neither html or body should ever be blocked as that will result in a blank website.

One recent issue: https://github.com/easylist/easylist/pull/8431 - https://webshop.elektroskandia.no/ was blanked out because body in that website had a value of .consent-summary-shown. It was matched by this generic GDPR banner filter: ##.consent-summary-shown. (It was later fixed by adding an exception: ##.consent-summary-shown:not(body)).

But that wasn't the only case. In Fanboy's Annoyance list, there are currently:

Adguard Annoyance:

Easylist:

I know these website blanking issues are mainly related to Annoyance lists that are not turned on by default in uBO, but they are still available and people use them. Not all issues get reported to filter list maintainers and there could be many unreported issues relating to these lists. Each :not(html) or :not(body) exception that currently exists, are related to fixing blank websites.

A specific URL where the issue occurs

https://webshop.elektroskandia.no/ (fixed now but this one is a recent case so I'll use it as a sample)

Steps to Reproduce

  1. Disable any possible Annoyance lists (to get rid of later added whitelistings)
  2. Add filter ##.consent-summary-shown to custom filters
  3. Go to https://webshop.elektroskandia.no/

Expected behavior

uBO would ignore this generic filter for this website, because it matches to the body element.

Actual behavior

Website is blanked, due to a match to the body element.

kuva

uBlock Origin version

1.37.3b13

Browser name and version

Firefox 91.0.1

Operating System and version

Windows 10

peace2000 commented 3 years ago

@felix-22 thanks for links!

Yeah adding :not(body):not(html) after each generic entry systemically, would pre-emptively help and would completely solve this issue, I even suggested that same thing recently: https://github.com/easylist/easylist/issues/8367 though it would bloat filterlist a bit. I don't know about performance impact. But encountering this same issue over and over again (hundreds of :not(html) and :not(body) exceptions) feels unnecessary as it would be possible to do pre-emptive measures.

kiboke commented 3 years ago

The best thing to do is to adjust blockers not to block body and html elements. The question is, could all major blockers work together to achieve that?

peace2000 commented 3 years ago

The best thing to do is to adjust blockers not to block body and html elements. The question is, could all major blockers work together to achieve that?

I hope so:

https://github.com/AdguardTeam/AdguardBrowserExtension/issues/1845 https://gitlab.com/eyeo/adblockplus/abc/adblockpluscore/-/issues/361

gorhill commented 3 years ago

The best thing to do is to adjust blockers not to block body and html elements.

Majority of CSS selector-based cosmetic filters are enforced through user styles, this means internally each of those filters would need to be suffixed with :not(html):not(body), while ensuring they are still properly reported in the logger. It's not trivial.

mjethani commented 3 years ago

What if this is taken care of at the DOM surveyor level?

-            pendingNodes.add(document.querySelectorAll('[id],[class]'));
+            pendingNodes.add(document.querySelectorAll(':not(html):not(body)[id],:not(html):not(body)[class]'));

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

peace2000 commented 3 years ago

If this were to be implemented, it still should be possible to be able e.g. to target html or body via different styling rules or via scriplets, the only thing that should be prevented is using display: none !important to them.

gorhill commented 3 years ago

it still should be possible to be able e.g. to target html or body via different styling rules or via scriplets

The way I see it the automatic exclusion of html/body should apply only to generic cosmetic filters which are made of a single class or id identifier, so this means you would still be able to target html or body by using these explicitly.

gorhill commented 3 years ago

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

This is an interesting idea. The issue of combinators maybe could be solved by having a separate reporting for ids/classes taken directly from html/body elements.

mjethani commented 3 years ago

I'm guessing that this would cause issues for selectors containing descendant and child combinators.

To clarify, let's say there's a document like this:

<body>
  <article></article>
</body>

And a filter like ##.page-body > article.

The document uses JS to add the page-body class to the body element.

If the DOM surveyor ignores the body element, the filter will never work?

gorhill commented 3 years ago

If the DOM surveyor ignores the body element, the filter will never work?

When I say "separate reporting", I mean the ids/classes of html/body elements would be reported in special properties (not through these), and treated differently in the background process -- for these uBO would only look-up the complex set, and thus would be able to find and apply something like .page-body > article.

peace2000 commented 3 years ago

which are made of a single class or id identifier

There are some generics that have multiple values (samples from Fanboy's Annoyances):

##.js-stickyFooter.u-bottom0.u-fixed ##.u-zIndexMetabar.u-fixed ##.article-section > .ui-button-close ###coiOverlay[role="banner"][style*="flex"]

Though I agree that they are not very common.

peace2000 commented 3 years ago

Installed 1.37.3b15 and tested with the site (webshop.elektroskandia.no) and with the sample rule I gave (##.consent-summary-shown), works fine, thanks!

gorhill commented 3 years ago

works fine, thanks

Thanks to @mjethani, this was a good idea to look at the content script angle -- until then I was looking at the background process angle which required a whole lot of changes -- it turns out the changes ended up being relatively trivial.

uBlock-user commented 3 years ago

Fanboy's Annoyances

If I enable that list with the generic filter you posted, the issue can be repro'd again..

peace2000 commented 3 years ago

Yep, strange... I can reproduce too.

peace2000 commented 3 years ago

If both of these:

##.consent-summary-shown ##.consent-summary-shown:not(body)

Are present, the site goes blank.

gorhill commented 3 years ago

Investigating.


Ok, I figured why this happens, investigating best fix.

peace2000 commented 3 years ago

Tested v.16, works without an issue!

peace2000 commented 3 years ago

Wait...

This form still blanks the site:

##[class*="consent-summary-shown"]

Imo it would be safer to ignore also rules that match to the html or body and have either of these selectors: *, ^, $, ~, |.

Sample rules from Fanboy's Annoyances:

##div[class*="NewsletterSubscribe_"] ##div[class*="newsletter-signup_"] ##div[class^="BackToTop_"] ##div[class^="backToTop_"]

gwarser commented 3 years ago

Only simple #id and .class are ignored.

gorhill commented 3 years ago

It would make sense to also prevent highly generic simple cosmetic filters from affecting body/html. These filters are not a result of surveying though, so another approach to fix this is required.

ryanbr commented 3 years ago

Would there be any perf penalty for not applying to body/html?

gorhill commented 3 years ago

Would there be any perf penalty for not applying to body/html?

Only benchmarking/measuring can tell, and various blockers may be affected differently, i.e. uBO unconditionally injects only a small subset of all generic filters (the "highly" generic ones), while ABP unconditionally injects all of them.

mjethani commented 3 years ago

Would there be any perf penalty for not applying to body/html?

Regarding the current solution, there are at least three ways to do it:

  1. Use :not(html):not(body) suffix (current)
  2. Use body prefix
  3. Filter out html and body nodes in JS
kiboke commented 3 years ago

Use body prefix

@mjethani Some websites inject stuff between head and body, so #2 would not be a perfect solution.

mjethani commented 3 years ago

… ABP unconditionally injects all of them.

Aside from higher CPU usage, this also causes a very high amount of memory usage. The last time I checked, using ABP causes your system to use more memory than if you didn't use an ad blocker. :not(html):not(body) might not be practical over there.

gorhill commented 3 years ago

Use body prefix

Filter list author would disagree with this, there are cases of sites injecting elements as child of html.

Filter out html and body nodes in JS

Not sure what you mean, the rules are injected as user styles, JS is of no use in such case. I did consider having user styles to unhide body/html, but this does not really work as the blocker can't know which display style to assign the body element, a site could well use flex as value.

As far as uBO is concerned, :not(html):not(body) is the only valid solution for highly generic cosmetic filters, and would be quite trivial if there was no logger or DOM inspector.

I know roughly the path to implement this, but for the second part of dealing with highly generic cosmetic filters, this will have to wait for the next dev cycle. At least for now the most likely to cause false positives are being mitigated.

gorhill commented 3 years ago

The last time I checked, using ABP causes your system to use more memory than if you didn't use an ad blocker.

Right, I tweeted about this a few months ago: https://twitter.com/gorhill/status/1345042527171325953.

I do not worry using :not(html):not(body) in uBO since it does not unconditionally inject all generic cosmetic filters. In any case, only benchmarking/measuring once the changes are made will tell whether this is an issue.

mjethani commented 3 years ago

Filter out html and body nodes in JS

Not sure what you mean, the rules are injected as user styles, JS is of no use in such case.

I was referring to the current solution with querySelectorAll(). It's possible to get all the nodes and then filter the result in JS. But it's unlikely that this would perform better than :not(html):not(body).

In any case, only benchmarking/measuring once the changes are made will tell whether this is an issue.

+1

gorhill commented 3 years ago

Using querySelectorAll() with highly generic cosmetic filters has caused performance issue in the past, see https://github.com/uBlockOrigin/uBlock-issues/issues/756, this is why the hidden element badge count in the popup panel executes only on-demand.

mjethani commented 3 years ago

If the DOM surveyor ignores the body element, the filter will never work?

This is what I meant:

<html>
  <body>
    <div class="bar">
      Hello.
    </div>
    <script>
      setTimeout(() => {
        document.body.className = 'foo';
      },
      3000);
    </script>
  </body>
</html>

Now I've added the filter ##.foo .bar and it never gets applied. This is because body is being ignored entirely, even though the target of the filter is the div element.

The filter ##.foo .bar should work, right?

mjethani commented 3 years ago

I did consider having user styles to unhide body/html, but this does not really work as the blocker can't know which display style to assign the body element, a site could well use flex as value.

I thought about this too. Aside from not knowing what value to use for the display property, it's also that a site might want to hide the body element for whatever reason, and this would force it to be shown at all times. Because a user rule with !important would always override any author rule.

mjethani commented 3 years ago

@mjethani Some websites inject stuff between head and body, so #2 would not be a perfect solution.

@kiboke if it's really outside body it wouldn't be visible, as far as I can tell.

<html>
  <body>
    Body.
  </body>
  <script>
    setTimeout(() => {
      let e = document.createElement('p');
      e.innerText = 'Hi!';
      document.documentElement.appendChild(e);
    },
    1000);
  </script>
</html>
kiboke commented 3 years ago

@kiboke if it's really outside body it wouldn't be visible, as far as I can tell.

@mjethani I just tried the code in Firefox and I do see Hi!

gorhill commented 3 years ago

The filter ##.foo .bar should work, right?

Yes, but this is an edge case for uBO because its mutation observer does not listen to attribute changes. This is why :watch-attr() exists, to solve such case:

...##body.foo:watch-attr(class) .bar
mjethani commented 3 years ago

I looked at revert too.

Suppose this is the user style sheet:

.consent-summary-shown {
  display: none !important;
}

body.consent-summary-shown {
  display: revert !important;
}

I suspect that revert !important still overrides any value in the author style sheet.

It's also not clear how this would work with multiple user style sheets.

krystian3w commented 3 years ago

would always override any author rule.

Then webmasters maybe start promote legacy userContent.css - win with addon API.


still overrides any value in the author style sheet.

IMO this is better implementation that oldie auto / initial. If I try reset cursor: revert I see useage CSS embeded into "browser" after hover on many elements instead photo or webaster idea for cursor.

So can break flexbox/grid pages too.

Default CSS Settings

Most browsers will display the <body> element with the following default values:

body {
  display: block;
  margin: 8px;
}

body:focus {
  outline: none;
}

This is why :watch-attr() exists, to solve such case

this explains part of the situations when on my PC the filter didn't hide the <body> and on gwaser PC it always did and then when on his it never hid and on mine it almost always in 2018-2020.

krystian3w commented 3 years ago

Maybe * aka all URL-s also need similar disable breakage.

Now possible use * to avoid create specific filter with :watch-attr() if someone must check ID/Classes/attribs injected very late into dom tree/main nodes.


E.g.:

https://czyodebrac.pl/co-to-za-numer-dzwonil/2147483647/

on page no longer works these generic filter:

##.cli-modal-open #cookie-law-info-bar

#cookie-law-info-bar is not body/html tag but "<div>".

or have very Race Conditon in uBO 1.37.3rc0, but 1.37.2 injected filter almost immediately.

gorhill commented 2 years ago

High profile case involving highly generic cosmetic filter: https://github.com/uBlockOrigin/uAssets/issues/11244.

gorhill commented 2 years ago

Highly generic cosmetic filters should not longer affect html/body elements. To confirm, I used the original steps-to-reproduce, except with the following filter:

##[class*="consent-summary-shown"]
mtxadmin commented 2 years ago

I would propose to add article tag to html and body. Should I open a new issue?

gorhill commented 2 years ago

Should I open a new issue?

No, doing what you suggest would just allow advertisers to use article to bypass cosmetic filters -- there is no restriction on the number of article tags in a page.