Chinese signs cannot be recognized.

wordplaydev / wordplay

An accessible, language-inclusive programming language and IDE for creating interactive typography on the web.

Other

64 stars 41 forks source link

Chinese signs cannot be recognized. #532

Open ShufaW opened 3 months ago

ShufaW commented 3 months ago

What are you trying to do that you can't?

For example, when I try to type "Stage([Shape(Circle(1m 0m 0m))])" in Chinese, it becomes "舞台([形状(圆圈(1m 0m 0m))])". However, I must use the English punctuation marks like "(", and ")", instead of the Chinese versions "（", and "）", because the Chinese punctuation marks are not recognized. This makes it challenging for users in non-English environments, as they need to spend considerable time converting punctuation marks between languages.

What is your idea?

Is it possible to improve software localization to recognize and interpret different punctuation marks and symbols based on the user's language settings. This would allow users to use their native signs without issues. Or there could have more shortcuts for more frequently used signs?

amyjko commented 3 months ago

It is definitely possible to support Chinese parentheses. (I didn't even realize they were different!) Do you have a list of punctuation equivalents, other than parentheses, that need to be supported? Here's the list so far.

（
）

amyjko commented 3 months ago

I see a list of full-width punctuation equivalents here:

https://en.wikipedia.org/wiki/Chinese_punctuation

Is that a good list of Unicode symbols to define as equivalents?

ShufaW commented 3 months ago

Yes, I think this is a professional list.

amyjko commented 3 months ago

Great, thanks for confirming! Do you have any interest in helping implement this? This issue isn't too difficult; it mainly involves adding some extra lines to Tokenizer.ts. If that sounds like a fun task, I can guide you on the PR process.

ShufaW commented 3 months ago

Yes sure! I can implement this.

amyjko commented 3 months ago

The key place to look is Tokenizer.ts. There you'll see a list of token definitions, such as the one defined for Sym.EvalOpen and Sym.EvalClose. Those define strings or regular expressions that define particular token types. You'll want to create additional entries in this list, likely right next to the corresponding tokens (order matters), having those new Unicode full-width versions also count as the same token types. The pull request should also add test cases to Tokenizer.test.ts to ensure that they tokenize correctly, at least one for each added symbol.

lpjjj1222 commented 1 week ago

Hi Amy @amyjko , I would like to request to be assigned to this issue, following the instruction given above to fixed this locale problem.

amyjko commented 1 week ago

Wonderful, it's yours! See the pointers above, I'm happy to elaborate.

lpjjj1222 commented 1 week ago

Oh amy, I guess you assigned the wrong guy🤣 @amyjko

amyjko commented 1 week ago

Oops, fixed.