kpdecker / jsdiff

A javascript text differencing implementation.
BSD 3-Clause "New" or "Revised" License
7.92k stars 496 forks source link

[fix] str.split can not handle surrogate pair, replaced with Array.from #395

Closed luoxzhg closed 6 months ago

luoxzhg commented 1 year ago

example

> Array.from("\nš€")
[ '\n', 'š€' ]
> "\nš€".split('')
[ '\n', '\ud835', '\udc00' ]
ExplodingCabbage commented 8 months ago

Hmm, interesting. I fundamentally agree with the proposed change; operating on Unicode code points instead of UTF-16 code units is the correct default behaviour (and frankly no code should operate on UTF-16 code units unless it's a library explicitly for dealing with UTF-16). However, I don't want to merge any change that modifies the results that jsdiff emits without adding unit tests, adding release notes, and doing a major version number bump. I also want to carefully audit the code line by line to make sure there's nowhere else where we're similarly treating strings as sequences of UTF-16 code units instead of Unicode code points.

I want to churn through some of the more straightforward-to-handle issues and PRs before doing the above - but I do intend to return to this PR in due course!

ExplodingCabbage commented 6 months ago

Adding docs and stuff over at https://github.com/kpdecker/jsdiff/pull/500