w3c / css-validator

W3C CSS Validation Service
https://jigsaw.w3.org/css-validator/
Other
204 stars 105 forks source link

Parse error with Unicode supplementary characters in “style” element in HTML document #383

Closed sideshowbarker closed 1 year ago

sideshowbarker commented 1 year ago

See https://jigsaw.w3.org/css-validator/validator?uri=https://sideshowbarker.net/tests/css-supplementary-code-point.html

The source of https://sideshowbarker.net/tests/css-supplementary-code-point.html has this:

<!doctype html><html lang="en"><title>s</title><meta charset="UTF-8">
<style>
h1::before { content: '🚧' }
</style>
<style>@charset "UTF-8";
h1::after { content: '🚧' }
</style>

In both cases — in the style elements both with and without @charset "UTF-8" — the 🚧 (U+1F6A7 CONSTRUCTION SIGN) character causes the CSS validator to report a parse error.

The parse error does not occur if the following patch is applied (patch --ignore-whitespace -p1 < patch) to the sources:

diff --git a/org/w3c/css/css/StyleSheetParser.java b/org/w3c/css/css/StyleSheetParser.java
index c6c3a97c..2b128dee 100644
--- a/org/w3c/css/css/StyleSheetParser.java
+++ b/org/w3c/css/css/StyleSheetParser.java
@@ -293,12 +293,7 @@ public final class StyleSheetParser

 //     if (cssFouffa == null) {
             String charset = ac.getCharsetForURL(url);
-            if (ac.getCssVersion().compareTo(CssVersion.CSS2) >=0 ) {
-                cssFouffa = new CssFouffa(ac, new UnescapeFilterReader(new BufferedReader(reader)), url, lineno);
-            } else {
             cssFouffa = new CssFouffa(ac, reader, url, lineno);
-
-            }
             cssFouffa.addListener(this);
 //     } else {
 //     cssFouffa.ReInit(ac, input, url, lineno);

So the cause would seem to be in https://github.com/w3c/css-validator/blob/main/org/w3c/css/util/UnescapeFilterReader.java, which appears to only be called on content that comes in from a style element in an HTML document (as opposed to being from a separate standalone stylesheet resource, or being entered from the validator’s direct-input textarea).

Related issue: https://github.com/validator/validator/issues/1344

sideshowbarker commented 1 year ago

I haven’t tested this yet, but I suspect that for checking by URL, the following patch would cause the same parse error. (Scroll to see the full patch).

diff --git a/org/w3c/css/parser/CssFouffa.java b/org/w3c/css/parser/CssFouffa.java
index ef3580bf..a195bb7b 100644
--- a/org/w3c/css/parser/CssFouffa.java
+++ b/org/w3c/css/parser/CssFouffa.java
@@ -26,12 +26,14 @@ import org.w3c.css.util.ApplContext;
 import org.w3c.css.util.CssVersion;
 import org.w3c.css.util.HTTPURL;
 import org.w3c.css.util.InvalidParamException;
+import org.w3c.css.util.UnescapeFilterReader;
 import org.w3c.css.util.Util;
 import org.w3c.css.util.WarningParamException;
 import org.w3c.css.util.Warnings;
 import org.w3c.css.values.CssExpression;
 import org.w3c.css.values.CssValue;

+import java.io.BufferedReader;
 import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.io.InputStream;
@@ -88,7 +90,7 @@ public final class CssFouffa extends CssParser {
      */
     public CssFouffa(ApplContext ac, Reader reader, URL file, int beginLine)
             throws IOException {
-        super(reader);
+        super(new UnescapeFilterReader(new BufferedReader(reader)));
         if (ac.getOrigin() == -1) {
             setOrigin(StyleSheetOrigin.AUTHOR); // default is user
         } else {