sindresorhus / filenamify

Convert a string to a valid safe filename
MIT License
486 stars 26 forks source link

Don’t `slice` a surrogate pair in half when you truncate. #34

Open issuefiler opened 2 years ago

issuefiler commented 2 years ago

Handle the possible surrogate pair (a pair of two UTF-16 code points) at the end properly when you truncate the filename.

Like how the package “truncate-utf8-bytes” does; but note that this package truncates a string to a specific number of bytes, not UTF-16 code points, unlike String.prototype.slice.

And by the way, truncating to a specific number of bytes instead of UTF-16 code points might be more suitable for filenames.

issuefiler commented 2 years ago

Example

"He slices the 🦄 in half".slice(0, 15)
// "🦄" === "\uD83E\uDD84"
"He slices the \uD83E"
issuefiler commented 2 years ago

Note that slice also breaks a Unicode grapheme cluster (e.g. combined family emojis). While disassembled Unicode grapheme clusters are still valid, breaking a surrogate pair, which represents a single code point, renders the string invalid in UTF-8 and UTF-16.

sindresorhus commented 1 year ago

We can use Intl.Segmenter to solve this now that this package targets Node.js 16.