kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.6k stars 232 forks source link

Unicode tokens in a character class #543

Open NDietrich opened 3 years ago

NDietrich commented 3 years ago

I'm trying to match a string of all printable Unicode characters, similar to [\w] character set for ascii. However it seems that Unicode properties (such as the \p{L} category for all letters in Unicode) are not supported in a character class (this isn't specific to Nearley).

can anyone suggest a way to match all printable Unicode characters or various Unicode Property categories?

conceptually, if I wanted a string of Unicode letters, I would want: str -> [\p{L}]:+ (I know this doesn't work, but it highlights the string I want to match).

Thanks