leeoniya / dropcss

An exceptionally fast, thorough and tiny unused-CSS cleaner
MIT License
2.13k stars 68 forks source link

css selector parser cannot advance past unsupported pseudo classes with parenthesis #57

Closed DareFail closed 2 years ago

DareFail commented 2 years ago

Was just testing this with the CSS on apple.com for fun. The library will freeze and crash if given this css:

.anything-dash-body{
  font-weight:400;
}
.anything-dash-body:lang(ar){
  line-height: 1;
}

The library mistakenly thinks .anything-dash-body is a body tag and does not remove it which is "ok", but I don't think it should freeze and crash when failing. All other selectors, etc seem to be ok.

DareFail commented 2 years ago

I am still going through the library for how you get this so fast so maybe the "-body" names can't be handled. But I can parse all html tags pretty reliably by adding a space in front of every new line and checking for all unique selector signs before and after: https://www.w3schools.com/cssref/css_selectors.asp

i.e.

var lineText = ".anything-dash-body:lang(ar){"
lineText = " " + lineText;
var regex = new RegExp("(?<= |{|\\[|:|,|\\.|>|\\+|~|=|~=|\\||=|^=|$=|\\*=|\\*)" + escapeRegExp("body") + "(?= |{|\\[|:|,|\\.|>|\\+|~|=|~=|\\||=|^=|$=|\\*=|\\*)", "g");
var matches = lineText.match(regex);
leeoniya commented 2 years ago

will take a look tomorrow

leeoniya commented 2 years ago

i pushed an infinite loop guard in the selector parser so we can actually debug this :rofl:

the problem is not with body, but with the :lang(ar) pseudo-selector; dropcss should strip or ignore these since there's no way to assert them from the markup.

<!doctype html>
<html>
  <head>
    <title>loopy</title>
    <script src="./dist/dropcss.iife.js"></script>
  </head>
  <body>
    <script>
      dropcss({
        html: '<a></a>',
        css: `
          a:lang(ar) {}
        `,
      });
    </script>
  </body>
</html>
DareFail commented 2 years ago

I didn’t explain well. It seems to only hang on the combination of thinking it’s an html tag and the lang selector. It seems to work fine if either are present alone.

DareFail commented 2 years ago

But that works awesome, I’ll test out a few more pages tomorrow. I suppose the class name part is a very minor separate issue

leeoniya commented 2 years ago

fixed by attached commit.

leeoniya commented 2 years ago

It seems to only hang on the combination of thinking it’s an html tag and the lang selector. It seems to work fine if either are present alone.

that's because the secondary, more-expensive selector parser only runs when there is matched html tags for the pre-parsing pass that pulls out only ids, classes and tag names. so it will first try to find .anything-dash-body, and only then will try parsing the full selector with the :lang(ar) pseudo, which is where it crashed.