microsoft / tsdoc

A doc comment standard for TypeScript
https://tsdoc.org/
MIT License
4.7k stars 130 forks source link

Unicode support in identifiers #299

Open phenomnomnominal opened 2 years ago

phenomnomnominal commented 2 years ago

Just debugged my way to this comment:

// Note: In addition to letters, numbers, underscores, and dollar signs, modern ECMAScript
// also allows Unicode categories such as letters, combining marks, digits, and connector punctuation.
// These are mostly supported in all environments except IE11, so if someone wants it, we would accept
// a PR to allow them (although the test surface might be somewhat large).
StringChecks._identifierBadCharRegExp = /[^a-z0-9_$]/i;

I mark @internal identifiers with a trailing Δ so I'm hitting a few issues with this! I'm happy to attempt a PR, but I'm curious what prior work has been done towards this, and what would be expected in terms of test coverage. I believe for my usecase there may have to be changes within the Tokenizer as well ({@link} parsing is a bit broken too), seems like there's plenty room here for stepping on toes 😅

Let me know what you reckon?