Closed client9 closed 4 years ago
Thanks for the bug report, I'm surprised it hadn't been noticed before! I figured out what is wrong, but I want to double check which characters need to be percent-encoded. Looks like the strict spec says /
needs to be escaped, but in your test case you have not.
hi! here's how I found it, which may or may not be useful for you.
I wrote a tool that looks at HTML files and then cuts out unused selectors in a CSS file. I used your excellent parser to read the CSS file. I didn't intend to write a CSS minifier, but if you use your parser just to echo out the CSS tokens, you get a damn good minifier. Thats the short version, my tool also removes empty @media rules, a few other things, but it does't touch the property values at all (e.g. 1000 -> 1e3 etc)
Anyways its over at https://github.com/client9/csstool
I tested it on bootstrap.min.css (version 4.0)
$ css minify < bootstrap.min.css > bootstrap.min.css-1
$ minify --mime text/css < bootstrap.min.css > bootstrap.min.css-2
$ ls -l bootstrap.min.css*
-rw-r--r--@ 1 nickg staff 144877 Feb 19 21:57 bootstrap.min.css
-rw-r--r-- 1 nickg staff 144826 Mar 6 21:34 bootstrap.min.css-1
-rw-r--r-- 1 nickg staff 145034 Mar 6 21:35 bootstrap.min.css-2
weird! So my dumb minifier (based on your parser) was actually better than the nodejs minifier that bootstrap uses (assuming I don't have errors -- very possible).
But surprising that your minifier in spite of all the extra optimizations it has, is larger... and so I started to investigate why...
again using your parser I whipped up a css indenter (which works but isn't quite done):
$ css format < bootstrap.min.css > bootstrap.min.css-indent
$ css format < bootstrap.min.css-1 > bootstrap.min.css-1-indent
$ css format < bootstrap.min.css-2 > bootstrap.min.css-2-indent
$ ls -l bootstrap.min.css*indent
-rw-r--r-- 1 nickg staff 174784 Mar 6 21:43 bootstrap.min.css-indent
-rw-r--r-- 1 nickg staff 174784 Mar 6 21:43 bootstrap.min.css-1-indent
-rw-r--r-- 1 nickg staff 174992 Mar 6 21:43 bootstrap.min.css-2-indent
$ diff bootstrap.min.css-indent bootstrap.min.css-1-indent
# nothing
the orig bootstrap and my dumb one are the same. ok.
here's the full diff the original (indented) and the one from minify
$ diff bootstrap.min.css-1-indent bootstrap.min.css-2-indent
1,6c1,4
< /*!
< * Bootstrap v4.0.0 (https://getbootstrap.com)
< * Copyright 2011-2018 The Bootstrap Authors
< * Copyright 2011-2018 Twitter, Inc.
< * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)
< */
---
> /*!* Bootstrap v4.0.0 (https://getbootstrap.com)
> * Copyright 2011-2018 The Bootstrap Authors
> * Copyright 2011-2018 Twitter, Inc.
> * Licensed under MIT (https://github.com/twbs/bootstrap/blob/master/LICENSE)*/
57c55
< font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
> font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
390c388
< font-family: SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;
---
> font-family: SFMono-Regular,Menlo,Monaco,Consolas,liberation mono,courier new,monospace;
3008c3006
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 8 8'%3E%3Cpath fill='%23fff' d='M6.564.75l-3.59 3.612-1.538-1.55L0 4.26 2.974 7.25 8 2.193z'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGZpbGw9IiNmZmYiIGQ9Ik02LjU2NC43NWwtMy41OSAzLjYxMi0xLjUzOC0xLjU1TDAgNC4yNiAyLjk3NCA3LjI1IDggMi4xOTN6Ii8+PC9zdmc+);
3014c3012
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 4 4'%3E%3Cpath stroke='%23fff' d='M0 2h4'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0IDQiPjxwYXRoIHN0cm9rZT0iI2ZmZiIgZD0iTTAgMmg0Ii8+PC9zdmc+);
3029c3027
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='-4 -4 8 8'%3E%3Ccircle r='3' fill='%23fff'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9Ii00IC00IDggOCI+PGNpcmNsZSByPSIzIiBmaWxsPSIjZmZmIi8+PC9zdmc+);
3042c3040
< background: #fff url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 4 5'%3E%3Cpath fill='%23343a40' d='M2 0L0 2h4zm0 5L0 3h4z'/%3E%3C/svg%3E") no-repeat right .75rem center;
---
> background: #fff url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0IDUiPjxwYXRoIGZpbGw9IiMzNDNhNDAiIGQ9Ik0yIDAgMCAyaDR6bTAgNUwwIDNoNHoiLz48L3N2Zz4=) no-repeat right .75rem center;
3599c3597
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(0, 0, 0, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzAgMzAiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PHBhdGggc3Ryb2tlPSJyZ2JhKDAsIDAsIDAsIDAuNSkiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIGQ9Ik00IDdoMjJNNCAxNWgyMk00IDIzaDIyIi8+PC9zdmc+);
3633c3631
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg viewBox='0 0 30 30' xmlns='http://www.w3.org/2000/svg'%3E%3Cpath stroke='rgba(255, 255, 255, 0.5)' stroke-width='2' stroke-linecap='round' stroke-miterlimit='10' d='M4 7h22M4 15h22M4 23h22'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB2aWV3Qm94PSIwIDAgMzAgMzAiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMDAwL3N2ZyI+PHBhdGggc3Ryb2tlPSJyZ2JhKDI1NSwgMjU1LCAyNTUsIDAuNSkiIHN0cm9rZS13aWR0aD0iMiIgc3Ryb2tlLWxpbmVjYXA9InJvdW5kIiBzdHJva2UtbWl0ZXJsaW1pdD0iMTAiIGQ9Ik00IDdoMjJNNCAxNWgyMk00IDIzaDIyIi8+PC9zdmc+);
4616c4614
< font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
> font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
4712c4710
< font-family: -apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,"Helvetica Neue",Arial,sans-serif,"Apple Color Emoji","Segoe UI Emoji","Segoe UI Symbol";
---
> font-family: -apple-system,BlinkMacSystemFont,segoe ui,Roboto,helvetica neue,Arial,sans-serif,apple color emoji,segoe ui emoji,segoe ui symbol;
4791c4789
< border-width: 0 .5rem .5rem .5rem;
---
> border-width: 0 .5rem .5rem;
4868,4869c4866,4867
< -webkit-perspective: 1000px;
< perspective: 1000px;
---
> -webkit-perspective: 1e3px;
> perspective: 1e3px;
4949c4947
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23fff' viewBox='0 0 8 8'%3E%3Cpath d='M5.25 0l-4 4 4 4 1.5-1.5-2.5-2.5 2.5-2.5-1.5-1.5z'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9IiNmZmYiIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGQ9Ik01LjI1LjBsLTQgNCA0IDQgMS41LTEuNUw0LjI1IDRsMi41LTIuNUw1LjI1LjB6Ii8+PC9zdmc+);
4952c4950
< background-image: url("data:image/svg+xml;charset=utf8,%3Csvg xmlns='http://www.w3.org/2000/svg' fill='%23fff' viewBox='0 0 8 8'%3E%3Cpath d='M2.75 0l-1.5 1.5 2.5 2.5-2.5 2.5 1.5 1.5 4-4-4-4z'/%3E%3C/svg%3E");
---
> background-image: url(data:image/svg+xml;charset=utf8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIGZpbGw9IiNmZmYiIHZpZXdCb3g9IjAgMCA4IDgiPjxwYXRoIGQ9Ik0yLjc1LjBsLTEuNSAxLjVMMy43NSA0bC0yLjUgMi41TDIuNzUgOGw0LTQtNC00eiIvPjwvc3ZnPg==);
7432d7429
< /*# sourceMappingURL=bootstrap.min.css.map */
thanks again for a great parser and minifier let me know if you need help with any of this, especially for verification
n
Looking at http://www.ietf.org/rfc/rfc3986.txt section 2.2
I'm not fully convinced what is right or wrong. Given If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.
I would assume characters such as /
would not need to be percent-encoded...but I can't find conclusive information about this other than it seems to work in most browsers
.
hi @tdewolff - yeah I think the spec is complete especially in cases of composition.
FWIW, i found this example in bootstrap 4.0.0 which is apparently is used by 3.6% of the internet, so I'd guess it would be noticed quickly if it caused problems.
Bootstrap appears to use clean-css with level 1 optimizations for it's minification.
hope this helps.
regards,
n
@lexborisov , do you know RFC for this case?
See https://tools.ietf.org/html/rfc3986#page-12 which says : / ? # [ ] @ ! $ & ' ( ) * + , ; =
all need to be percent-encoded. I haven't found many specific mentionings of data URIs, except for https://tools.ietf.org/html/rfc2397, which refers to https://tools.ietf.org/html/rfc2396 and says ; / ? : @ & = + $ ,
need to be percent-encoded, which is already considerably less. It also says it needs those characters percent-encoded when they don't conflict with their reserved meaning. The latter is what is used by url.QueryEscape
(the function used by the data URI minifier).
Looking at https://en.wikipedia.org/wiki/Data_URI_scheme though, it seems : ; ,
don't required percent-encoding.
See https://codepen.io/tigt/post/optimizing-svgs-in-data-uris for a better explaination, however https://perishablepress.com/stop-using-unsafe-characters-in-urls/ suggests that we can keep ; / ? : @ = &
as regular characters: those characters are not used for theire "defined" purpose.
In the end, I'm really not sure what to do. Data URIs seem to refer to an older RFC that is less strict, and given some definitions it seems even those reserved characters mostly don't have a specific meaning for data URIs (so we don't have to percent-encode?). On the otherhand, many people refer to the newer RFC which is far stricter, but also says represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component
, and it makes a distinction between gen-delims
and sub-delims
. I just have NO idea what this means in the end. Most people say "stay save" and encode them all, but as I read the RFC I feel like there is some leeway.
@tdewolff
Looks like the strict spec says / needs to be escaped, but in your test case you have not.
Actually, your minifying rules are wrong here. The original CSS string (url("data:image/svg+xml;ch
[...etc.]) has the URI surrounded by double quotes. So it was valid. No additional escaping was necessary.
But simply dropping the surrounding quotes makes the CSS invalid.
I suggest that you add an option to the CSS minifier to not drop quotes around the url()
construct. I would rather not have any quotes removed from the URIs there.
@dchenk Quotes can be removed within the url(
function as per the spec. As to escaping /
, this is debatable as it is ill-defined in the spec. However, you may prove me wrong by quoting the specifications for CSS3, see https://developer.mozilla.org/en-US/docs/Web/CSS/CSS3
See for the url definition: https://drafts.csswg.org/css-values-3/#urls it specifically allows removing quotes. Important note to self: it appears we always need to escape (
, )
and whitespace, as per https://drafts.csswg.org/css-syntax-3/#consume-url-token. Also, what are <url-modifiers>
?
@tdewolff you should be reading the actual spec, not MDN. (I mean you probably are, but that link you gave isn't sufficient.)
The spec says (https://www.w3.org/TR/CSS2/syndata.html#uri):
Some characters appearing in an unquoted URI, such as parentheses, white space characters, single quotes (') and double quotes ("), must be escaped with a backslash so that the resulting URI value is a URI token: '(', ')'.
In other words, the code url("data:image/svg+xml;ch...and-so-on")
is valid. And simply dropping the double quotes without escaping the string is invalid.
@tdewolff a simple solution to consider would be first doing strconv.Quote on strings within url()
and then simply slicing out the string within the double quotes. This way all the usual suspects would be escaped.
When dropping the quotes, the string should already be escaped properly. If not, you can submit a bug report. The issue was that some believe that we escape too many characters, but this is ill-specified.
I've fixed two bugs in the parser and data URI minifier:
The problem still remains that for data URIs it is ill defined which characters should be escaped and which not. Right now I still follow the same rules as decribed in https://tools.ietf.org/html/rfc2397, which referenced uric
in https://tools.ietf.org/html/rfc2396:
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" |
"(" | ")"
Other characters than unreserved
are percent-encoded, but one could argue that some of the other characters don't need to be encoded as they have no special meaning. There is however no specification or RFC about this.
Hello, thanks for much for working minify and the matching parsers!
In CSS, the URI minifier seems to picking the wrong format to get smallest output
Here's a reduced test case:
In addition, a flag to prevent base64 encoding might be wise. I've found that base64 encoding might make an absolute file smaller, but it also wrecks compression. In other words, the larger original format sometimes compresses better than the smaller base64 version.
It is also highly possible there is user error in my analysis. Thoughts welcome and thanks again.
n