tigt / mini-svg-data-uri

Small, efficient encoding of SVG data URIs for CSS, HTML, etc.
https://npm.runkit.com/mini-svg-data-uri
MIT License
309 stars 16 forks source link

feat support specify charset encoding #22

Closed Matrixbirds closed 1 year ago

Matrixbirds commented 2 years ago

The data URI default charset is US-ASCII, it can cover many cases, but considering it's a utility library, it should have more ability to support specific charset.

tigt commented 2 years ago

Hm. I did some more testing, and it looks like adding ;charset=utf-8 can only be an improvement if the SVG contains characters outside of the ASCII range, like so:

ASCII, URL-escaped

data:image/svg+xml,%3csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'%3e%3cpath d='M10,10 H90 L50,70'/%3e%3ctext y='90'%3e' %26apos; %23 %25 %26amp; %c2%bf %f0%9f%94%a3%3c/text%3e%3c/svg%3e

UTF-8, unescaped

data:image/svg+xml;charset=utf-8,%3csvg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'%3e%3cpath d='M10,10 H90 L50,70'/%3e%3ctext y='90'%3e' %26apos; %23 %25 %26amp; ¿ 🔣%3c/text%3e%3c/svg%3e

Both of these are 202 characters long, so there may be a sweet spot to target. The unescaped UTF-8 version should be tested across supported browsers, and compression may make the question even more interesting.

tigt commented 2 years ago

I’m not against adding charset functionality for SVGs that need it, but this PR doesn’t yet handle the character set conversions, which would be necessary to ship this feature. By default, Node handles the UTF-16 JS internals and serializing it to UTF-8 for us, so that’s why I’ve been able to avoid it until now.

Are you still in favor of the feature with that in mind?

tigt commented 1 year ago

Closing because the proposed code doesn’t seem to handle any character encoding conversion that mini-svg-data-uri doesn’t accidentally handle already.

At least, my current understanding is that if the code isn’t using TextEncoder somehow, it probably won’t make valid data: URIs with anything but the default ASCII.

If I’m wrong, please feel free to tell me and I’ll open this back up. I could definitely see at least UTF-8 support being important for SVGs beyond American English, but I don’t have faith in my ability to properly support that myself. (As evidenced by the wrong sentence about charset at the bottom of the README.)