ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
5.01k stars 217 forks source link

Support use of surrogate pairs in ranges #355

Closed pdubroy closed 2 years ago

pdubroy commented 2 years ago

The following does not work in Ohm, but probably should:

G {
  start = "\uD83D\uDE00"..""\uD83D\uDE00"
}

This is equivalent, and doesn't work either:

G {
  start = "😀"..""😀"
}

This code point is represented as a surrogate pair in JS, and should be treated like a single character.

We can use str.match(/./ug).length to check the number of code points:

> '😀'.match(/./ug).length
1
> 'ab'.match(/./ug).length
2