sass / dart-sass

The reference implementation of Sass, written in Dart.
https://sass-lang.com/dart-sass
MIT License
3.84k stars 351 forks source link

Add ASCII output support #568

Open rzhw opened 5 years ago

rzhw commented 5 years ago

I've been working on a product using libsass/sassc and have migrated it to Dart Sass. Presently, this product doesn't support UTF-8 characters in stylesheets.

It looks like Dart Sass only supports outputting as UTF-8, with dart-lang/sdk#11744 being the blocker given in the README for why there's no support for more encodings (UTF-16, etc). Dart does however appear to have an AsciiEncoder.

For the time being, we've added an extra step to CSS escape non-ASCII characters in generated stylesheets. (On a related note, we also remove the @charset 'UTF-8'; atrule, both because it would be technically incorrect for an ASCII-encoded stylesheet, and because of #567.)

This isn't trivial because of sourcemaps, so we're doing this step with a PostCSS plugin.

Would adding ASCII-encoded output support be in scope of Dart Sass? I'd imagine when Dart adds other encoders, having this ready would let other encodings be sibling output options alongside UTF-8 and ASCII.

nex3 commented 5 years ago

This is something I could see adding as a command-line flag (--ascii-only or something like that) to serialize Unicode characters as ASCII escapes.

bit-wise commented 5 years ago

@nex3 will this command-line flag be ready when Ruby sass is deprecated?

nex3 commented 5 years ago

Ruby Sass has been deprecated for almost a year now. And no, there's no plan to tie this issue to its release cycle. It's marked as "help wanted", which means it's not a priority for the Sass team, but if an external user wanted to contribute a fix we'd help them land it.

jpuncle commented 4 years ago

@Awjin When to solve this problem?

BuptStEve commented 4 years ago
// @source - [@Stephn-R](https://github.com/sass/sass/issues/1395#issuecomment-57483844)
// @description converts 1 or more characters into a unicode
// @markup {scss}
// unicode("e655"); // "\e655"
@function unicode($str){
    @return unquote("\"")+unquote(str-insert($str, "\\", 1))+unquote("\"")
}
clshortfuse commented 3 years ago

So, I'm trying to get a zero-width unicode character to work with SASS. It won't appear without a hex editor because of SASS reinterpreting that. On normal CSS, it's:

div::before {
  content: "\200B";
}

But SASS will rewrite it as:

div::before {
  content: "​";
}

It's a little frustrating trying to debug with an invisible character since SASS wants to rewrite it. A flag would unfortunately be global to everything when it's somewhat of an edge case where you want raw/literal characters to be outputted as a string. I just have these few instances where it's better to not convert to Unicode. I can imagine there's a lot of other characters, both printable and non-printable, that would greatly benefit from not being rewritten as Unicode, such as:

watershed commented 3 years ago

Finding this thread today because of an issue similar to @clshortfuse above.

I learn that Dart Sass is converting my authored:

content: '\00A0/\00A0';

to:

content: ' / ';

…where the spaces written do seem to be no breaking spaces, but they are prone to rendering in the browser as:

A

No such problem with Libsass, but then Libsass fails at some other stuff.

watershed commented 3 years ago

Here’s another example that erratically fails due to to the following authored Sass:

content: '\0231F';
transform: rotate(45deg);

…ending up as:

content:"⌟";transform:rotate(45deg);

what-you-can-do-generated-content-fail

See also this Twitter thread.

nex3 commented 3 years ago

@watershed As mentioned earlier in this thread, Sass emits a @charset declaration or a BOM whenever it emits non-ASCII output, which will force browsers to interpret the stylesheet as UTF-8 even if it's served with non-UTF-8 headers. If that's not working, chances are you're doing some sort of post-processing that's incorrectly stripping that extra information.

cbush06 commented 3 years ago

@nex3 -- I have not touched any settings of Angular CLI that (to my knowledge) would affect the inclusion or omission of @charset. In fact, I haven't configured any of the compilation process (it's using defaults). However, I'm encountering this issue.

nex3 commented 3 years ago

@cbush06 Are you seeing a case where your CSS is being served with @charset (or a UTF-8 BOM) and is still being interpreted as the wrong character set? Otherwise, I'm not sure how Sass can address your issue.

cbush06 commented 3 years ago

@nex3 -- After reading your posts earlier, I went and checked. What I discovered is the CSS generated for each angular component do have the @charset. Other SCSS files (e.g. from my assets folder) do not have it included.

IlBaffo commented 3 years ago

Disclaimer: I would have posted it in https://github.com/sass/dart-sass/issues/1219 because I don't feel that this two issues are the same even if they are marked as duplicate but that issue was closed. While I agree that converting from unicode to escape is necessary to output ascii files and that an explicit flag is need in this scenario I also think that the opposite is not always true.

Hi, I'm currently investigating the move of a project from node-sass to dart-sass and the automatic conversion of unicode escapes leaves me a little confused because I don't get the reason to track and convert them instead of treating them as raw strings if it is semantically the same to the interpreter? Is it there some automatic conversion that is harder to explicitly disable or requires a lot of effort? This troubles me with FontAwesome and other "content" properties that are more explicitly identified as icons when shown as escapes instead of japanese or non-printable characters.

To clarify: I would expect this kind of behaviour as a mean of minification and not applied while producing "expanded" styles as output but only when "compressed" mode is enabled without needing an extra flag.

In any case this is only my opinion. Thanks.

nex3 commented 3 years ago

Sass's internal representation of a string, like just about every programming language's, is just a sequence of characters (in Sass's case, a "character" means a "Unicode code point"). Whether those characters were written as escapes or not, they're all converted into the character in question internally—if you write "\24" that's exactly the same as writing "$". Both of these return a string whose contents is a single character, U+0024 DOLLAR SIGN. This is the same process that happens in JavaScript when you write "\x24" or "$"`.

This means that when we go to serialize a string to CSS, all we know are the Unicode code points that are the contents of the string. We need to determine how to serialize those without any information about where they came from, and so we serialize them as Unicode rather than escapes so that people writing non-English languages have legible CSS files.

IlBaffo commented 3 years ago

I'm just curious why that conversion happens at all during the parsing phase and not during the serialization phase keeping it as-is; could the unescaped string be a keyword or another expression like \x24myvariable or \x33 * \x33 ? Wouldn't source maps have a different offset in column number from the physical file?

Again, I'm asking just out of curiosity because I feel that this is a deliberate choice (like keeping down complexity?) rather than a language limitation, I don't mean to criticize.

Again, thanks for the work.

nex3 commented 3 years ago

I'm just curious why that conversion happens at all during the parsing phase and not during the serialization phase keeping it as-is; could the unescaped string be a keyword or another expression like \x24myvariable or \x33 * \x33 ?

All the string functions in Sass need to operate on the strings' actual text; if we lazily parsed escape codes, it would make all the functions much less efficient and much more complex. Imagine trying to implement str.slice() when you have to adjust all the indexes to account for escape sequences that might exist. It gets even worse when you start thinking about how strings interact with custom functions; we'd basically have to eagerly resolve escapes as soon as a host language is dealing with a string, which means that no custom functions would ever preserve escapes.

Wouldn't source maps have a different offset in column number from the physical file?

No, source maps are totally orthogonal. They're tracked on a statement-by-statement basis, not value-by-value.

IlBaffo commented 3 years ago

Ok, I understood that the problem arises when the string get manipulated and there is no telling if or when that will happen during the parsing phase. Anyway how did libsass achieve that? I don't know enough C to navigate their codebase but there must be some specific pattern implemented there, even chrome shows both representation in the css inspector.

As a wild speculation (and just for fun) could cpu usage in this case could be traded for memory by storing both representations in the StringExpression as different fields (only when there is an escaped char)? Thus allowing an eventual "length" or "indexOf" to read from the unescaped string with zero performance loss and proxying the common string methods to reflect the changes in the (non always present) raw string.

nex3 commented 3 years ago

Ok, I understood that the problem arises when the string get manipulated and there is no telling if or when that will happen during the parsing phase. Anyway how did libsass achieve that? I don't know enough C to navigate their codebase but there must be some specific pattern implemented there, even chrome shows both representation in the css inspector.

They didn't. In older versions of LibSass, string functions were simply broken—they returned the wrong results for strings with escape sequences. Newer versions work the same as Dart Sass (with the exception that they'll avoid parsing certain property values entirely, causing some escape sequences to be retained if they're written directly in a property value even though they'd be resolved if they were stored in a variable).

As a wild speculation (and just for fun) could cpu usage in this case could be traded for memory by storing both representations in the StringExpression as different fields (only when there is an escaped char)? Thus allowing an eventual "length" or "indexOf" to read from the unescaped string with zero performance loss and proxying the common string methods to reflect the changes in the (non always present) raw string.

This would only improve the performance of a few functions—functions like str.slice() and operations like string concatenation would still be very expensive and complicated and also more memory intensive.

Novynn commented 3 years ago

As this is a breaking change from node-sass, shouldn't this difference at least be represented in the documentation?

nex3 commented 3 years ago

Node Sass's current behavior is the same as Dart Sass's (except again in edge cases involving unparsed properties). Even in those cases, it's not a breaking change: a literal non-ASCII character has exactly the same behavior as that character's escape code.

beard7 commented 2 years ago

@nex3 Apologies for bringing this up again. I've read many threads regarding this issue, but I still have a problem and I don't know if this is due to sass or something else.

Despite sass inserting a @charset declaration at the top (or almost top -- see below) I still have issues with icon fonts, i.e. Fontawesome, intermittently being rendered with the wrong font, usually Times New Roman.

Like I say, this happens intermittently, but it never happened when the generated CSS retained non-ASCII character escape codes.

As for the @charset declaration being almost at the top of the output file -- this happens when I @import Google fonts: the @charset is placed after the @import, which is hoisted to the top of the CSS. I have no idea if this has anything to do with the issue.

Any advice or suggestions you can provide would be greatly appeciated.

nex3 commented 2 years ago

@beard7 it sounds like the root issue there is whatever software is hoisting your @imports above your @charsets. That's not a safe transformation to make, and can be expected to break the browser's ability to correctly determine the encoding of your document.

It's worth noting that in Dart Sass 1.38.0, we released a change where characters from Unicode Private Use Areas are now emitted as escapes in expanded mode. That should also mitigate the pain here without breaking the ability to have legible non-English.

jpcamara commented 2 years ago

FWIW - there is a large issue thread from font-awesome which indicates that while having @charset "utf-8"; and <meta charset="UTF-8"> improve the situation, they do not correct it 100% of the time. It still happens intermittently.

The only solution found has been been something very similar to what @BuptStEve mentioned earlier in this issue (I think simply because it causes dart-sass to ignore the strings).

https://github.com/FortAwesome/Font-Awesome/issues/18775#issuecomment-1073217558

@function fa-content($fa-var) {
  @return unquote("\"") + unquote(str-insert($fa-var, "\\", 1)) + unquote("\"");
}

We just recently switched our scss code over to dart-sass and started experiencing the intermittent issues with font display right after deploying. Switching to this approach "fixes" it for us as well.

nex3 commented 2 years ago

@jpcamara Can you provide a reproduction case where a browser doesn't respect @charset "utf-8"? I'll need the specific browser version and a stylesheet with a @charset declaration or BOM that includes two non-ASCII characters, one written in raw UTF-8 and one written in an escape sequence, so I can verify that they render differently.

jerryephicacy commented 1 year ago

I am still facing this issue even after modifying the fa-content.

image

When I inspected the source of the bundled css file, this is how it appears ^^.

Only in production mode bundle in webpack this happens.

Requesting your help pls @jpcamara ?

Versions: "sass": "1.54.3", "sass-loader": "13.0.2",

Font-awesome 5x

cc @logeshpaul

nex3 commented 1 year ago

@jerryephicacy Is that rendering incorrectly in a browser, or is it just in your text editor? Because it's entirely possible that your text editor is just not loading the file as UTF-8.

jerryephicacy commented 1 year ago

@nex3 ... randomly it fails in the browser.

But I have made it work now.

Just have to modify the fa-content function and remove the slash ( \ ) symbol from the icon variables.

nex3 commented 1 year ago

To reiterate the above, if you can provide a reproduction where this fails in the browser even with a @charset or UTF-8 BOM, we will reconsider our default output.

nmoresco commented 1 year ago

I think the assumption that all CSS output by dart-sass will be loaded directly by a browser is not a given. For example, I was running into this problem because my compiled CSS files are loaded by GWT, which doesn't know about the @charset annotation. Thus, you get this while compiling and it swallows the css block.

[WARN] Line 13 column 12: encountered """. Was expecting one of: "}" "+" "-" "," ";" "/" <STRING> <IDENT> <NUMBER> <URL> <PERCENTAGE> <PT> <MM> <CM> <PC> <IN> <PX> <EMS> <EXS> <DEG> <RAD> <GRAD> <MS> <SECOND> <HZ> <KHZ> <DIMEN> <HASH> <IMPORTANT_SYM> <UNICODERANGE> <FUNCTION>

I think the newer versions of GWT that use GSS might not have this problem, but I can't move to that easily. Regardless of my specific situation, my point is that Sass output is used in many kinds of toolchains that aren't the browser.

nex3 commented 1 year ago

Sass targets the CSS specification. We'll make exceptions for browser behavior that's contrary to the spec only because browsers are the overwhelming majority of CSS consumers. Any other tool should follow the specification when consuming CSS, and if it doesn't it's pretty clearly a bug in that tool and not in Sass.

jerryephicacy commented 1 year ago

@nex3 ,

  1. When running the application in dev mode using webpack, we have charset utf8 present at the top of the compiled application.css file. But, it is removed in prod mode and the charset utf8 is not there.
  2. But in both dev mode and prod mode, the meta charset utf8 is present in the head tag.
  3. I tried bumping up css-loader, sass-loader, postcss-loader, etc., and still not successful.

Hence, when I followed what @jpcamara mentioned in the comment above, I have modified the fa-content and removed the slash ( \ ) symbol on the variables and the resulting output in the css is fa-font-awesome-flag:before{content:"\f425"} which renders correctly everytime in the browser.

kdagnan commented 1 year ago

Would like to add to this discussion: We are currently moving to dart-sass from sass (node). Our build step results in the correct unicode character. IE:

icon-flag: before {
   content: "\E95E"
}

result of compilation: icon-flag:before{content:""} (This character: https://utf8-icons.com/utf-8-character-59742)

However, randomly the browser will not accept the encoding and will display the strange characters. I've added @charset "utf-8" to my SCSS, and to my index.html. It seems to happen maybe 1 in 50 reloads. The escaping function mentioned above seems to fix it but it seems hacky.

nex3 commented 1 year ago

@jerryephicacy

  1. When running the application in dev mode using webpack, we have charset utf8 present at the top of the compiled application.css file. But, it is removed in prod mode and the charset utf8 is not there.

Emphasis mine. Something in your stack is removing the @charset declaration, which seems like the actual bug here. You can't just delete parts of a file and expect it to work the same way.

@kdagnan I'll ask you the same thing I've asked everyone else in this thread: provide a working reproduction of this bug, including the specific browser version in which you're seeing the error.

Yegorich555 commented 1 year ago

For webpack it can be fixed with css-unicode-loader

jerryephicacy commented 1 year ago

@Yegorich555 , thanks for the suggestion. Are you sure that we can go with a 2 year old package? Do you have any information on the maintenance and any top projects are using this?

If not, any alternative?

Yegorich555 commented 1 year ago

@jerryephicacy yes, I'm. It works fine with webpack 4 and webpack 5. I've been using it for both of my production projects. Despite looks like loader isn't touched during 2 years it works without bugs in my case ;)

jerryephicacy commented 1 year ago

@Yegorich555 , thanks so much for the idea. It works well!

@nex3 , I have solved this with the css-unicode-loader package using webpack.

robinp commented 1 year ago

Just to increase crosslink density... I suspect based on searching this has to do with Chrome(Chromium) caching and not having an explicit charset or BOM. Do anyone see this issue on other browsers?

References:

In all cases, the suggestion seems to be to add the explicit encoding back, either with charset directive, response header or BOM.

nsunga commented 1 year ago

This is something I could see adding as a command-line flag (--ascii-only or something like that) to serialize Unicode characters as ASCII escapes.

Hello @nex3 !

sorry for bringing up such an old thread.

i just wanted to confirm: the --ascii-only flag isnt supported yet right?

and if not, dart-sass has no intention in doing this?

nex3 commented 1 year ago

This issue is open, which indicates that it is not supported but we would like to do it.

nlozovan commented 10 months ago

I can confirm that this is still happening, and icons will intermittently render as "" instead of the normal format.

nex3 commented 10 months ago

@nlozovan Does the CSS file you're serving to Chrome retain the @charset rule and/or the UTF-8 byte-order-mark, or have those been stripped out by some other processing?

nlozovan commented 10 months ago

@nlozovan Does the CSS file you're serving to Chrome retain the @charset rule and/or the UTF-8 byte-order-mark, or have those been stripped out by some other processing?

Yes, it's stripped out for some reason. I ended up forcefully adding a charset declaration after the build. For now, I can't reproduce the error anymore, fonts are rendered correctly. I have a theory that, while Chrome is loading the CSS file from the cache, it will (sometimes) apply a wrong encoding. Yeah, it seems the planets aligned here in a wrong way somehow ;)

nex3 commented 10 months ago

If the @charset/BOM was stripped out, that's almost certainly the culprit. CSS requires those to correctly identify the encoding under all circumstances.

foolip commented 10 months ago

I have landed a change in Chrome to make URL parsing in CSS spec compliant, by ignoring the encoding/charset of the stylesheet. This change is available in Chrome Canary 119.0.6025.0 and later and can be enabled by passing --enable-features=CSSParserIgnoreCharsetForURLs as a command line argument to Chrome.

@nlozovan would you be able to test with Chrome Canary with this command line argument to see if it has an effect on the problem you're experiencing?

nlozovan commented 10 months ago

@foolip here are some tests that I've made. Scenario: a CSS file will have @font-face declarations with some custom fonts, and the icon content is in Unicode.

I've used Canary Version 119.0.6030.0 (Official Build) canary (arm64). Also, I cannot explain why Chrome Canary which was opened as an app works differently than the terminal Chromium. Hope this is helpful and yes, seems Canary has an update on this.

foolip commented 10 months ago

@nlozovan thank you for testing! It sounds like you get the error when opening Chrome from the command line both with and without --enable-features=CSSParserIgnoreCharsetForURLs, right? That would suggest it has no bearing on the problem you're seeing, but then I don't understand why the problem doesn't reproduce when opening Chrome Canary as an app. Are you sure that Chrome Canary was fully closed between each test, so that it didn't open a new tab in an already open browser? I ask only because that's the only thing that comes to mind as an explanation for what you're seeing.

nlozovan commented 10 months ago

@foolip Yes, I was making sure the session is over, the cache is enabled in both cases. Tested now one more time and I have the same results. The terminal is not throwing any errors, it's opening a brand-new session. That is really interesting. How I reproduce the error quite quickly is by opening and closing the Inspector Tools via the shortcut, multiple times, on page load. I do this 2-3 times and I can see the font icon error. Not on the Canary app though.

foolip commented 9 months ago

@nlozovan thanks for double check that. I also don't understand why you'd see a difference between starting Chrome Canary by clicking an icon and from the command line.

To help me understand if the change I made behind a flag affects your case, can you share the relevant part of the stylesheet? I'm looking for non-ASCII characteres in URLs, which is what my change should affect.