ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5.01k stars 217 forks source link

feat!: Make `any` consume a full code point, not a single code unit #424

Closed pdubroy closed 1 year ago

pdubroy commented 1 year ago

A JavaScript string is a sequence of 16-bit code units. Some Unicode characters, such as emoji, are encoded as pairs of 16-bit values. For example, the string '😆' has length 2, but contains a single Unicode code point. Previously, any always consumed a single 16-bit code unit. Now, it consumes the next code point, i.e. a full Unicode character.

BREAKING CHANGE: this changes the meaning of any in user grammars