micromark / micromark

small, safe, and great commonmark (optionally gfm) compliant markdown parser
https://unifiedjs.com
MIT License
1.78k stars 65 forks source link

Support Eastern Arabic numerals for lists #26

Closed mustafa0x closed 4 years ago

mustafa0x commented 4 years ago

See: https://svelte.dev/repl/982673f97faa457692eb4d7bd51998df?version=3.29.0

Tl; dr: Some languages use different numerals (eg. ١,٢,٣ instead of 1,2,3). Can those numerals also be used to mark lists?

A quick test with babelmark indicated that https://github.com/dotnet/docfx supports this. https://babelmark.github.io/?text=%D9%A1.+%D9%85%D8%B1%D8%AD%D8%A8%D8%A7%0A%D9%A2.+%D8%A8%D8%A7%D9%84%D8%B9%D8%A7%D9%84%D9%85

wooorm commented 4 years ago

markdown is an ASCII format, so it doesn’t make sense from the language format to support non-ASCII syntax.

You can solve this with CSS, list-style-type: arabic-indic.

See also the readme, which covers topics of CommonMark and extensions (Comparison).

mustafa0x commented 4 years ago

Thank you for your reply and suggestion.

so it doesn’t make sense from the language format to support non-ASCII syntax

Numerals are part of the text, so in some senses this equates to telling users not to user their own language. I appreciate parsing all numeral systems might not be feasible, but I just wanted to share some perspective. =)

I'm taking a look a the code and will patch necessary files (for my projects). I would appreciate any tips of course of where to look!

Thanks again!

wooorm commented 4 years ago

Numerals are part of the text, so in some senses this equates to telling users not to user their own language. I appreciate parsing all numeral systems might not be feasible, but I just wanted to share some perspective. =)

I understand. I agree that this is a big problem in Markdown. And programming as a whole. But I’m not going to implement non-standard things. See also https://talk.commonmark.org to discuss changes, which includes 6+ years of discussing non-English, too.

mustafa0x commented 4 years ago

You can solve this with CSS, list-style-type: arabic-indic.

Yes, the issue though is when authoring however. And using a numeral system that is different from the rest of the document causes a myriad of issues. They're mostly minor, but they do have far reaching implications. For example, on Macs, if the user want's a list they have to change the language, write 1, change the language back, write the list item's text, change the language, write 2, change the language back, write the list item's text, and so on.

I agree that this is a big problem in Markdown.

It mostly manifests in this situation, since text (i.e., 1, 2, 3) doubles as syntax, unlike all/most other cases, where you only have syntax (e.g. *, -, [], {}).

mustafa0x commented 4 years ago

For what it's worth, I was able to accomplish this by patching the following two files:

Thanks again.