micromark / micromark-extension-directive

micromark extension to support generic directives (`:cite[smith04]`)
https://unifiedjs.com
MIT License
31 stars 16 forks source link

Enable wide Unicode support for names #24

Open viktor-yakubiv opened 10 months ago

viktor-yakubiv commented 10 months ago

Initial checklist

Description of changes

Enables almost full Unicode support for directive names. This is tricky to test, I've added only Latin, Greek and Cyrillyc characters. Also, I have tested combining accent modifiers at the middle and at the end of directive name.

Attribute names are worth to look too but maybe in a separate PR.

The PR should be ready to review. Thank you!

Closes #23

codecov-commenter commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (7f23ba8) 100.00% compared to head (73eb92a) 100.00%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #24 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 9 9 Lines 1416 1439 +23 ========================================= + Hits 1416 1439 +23 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

wooorm commented 10 months ago

You could maybe test the math symbol for pi π, and an emoji, such as 🌍?

wooorm commented 9 months ago

Thanks for your continued work. I’ve been wondering the last week what to do about punctuation. And about the implications to the semver version of this package.

In the ASCII range we allow ascii alphanumerics and -, ., _. I think in the rest of the unicode range we should also only allow alphanumerics. We should be able to do that by checking classifyCharacter(x) === undefined. I don’t see unicode “punctuation” tested much, could you see if that changes things?

Some examples of symbols/punctuation outside of the ascii range are and £!