On Chinese numerals - Githubissues

wenyan-lang / wenyan

文言文編程語言 A programming language for the ancient Chinese.

https://wy-lang.org/

MIT License

19.62k stars 1.1k forks source link

On Chinese numerals #338

Open brynne8 opened 4 years ago

brynne8 commented 4 years ago

As for single numbers, wenyan-lang will output as Chinese numbers. But when we output an array of numbers, it becomes arabic numerals. This is a bit inconsistent.
The myriad scale 10²⁸ (穰) is strangely written in Japanese Shinjitai (穣)
十二 which means 12, is ouput as 一十二, which is not the common form.
I have seen some ancient Chinese books, for example 《全晉文》, 《水滸傳》. It seems the 一百一 should be parsed as 101 instead of 110. But wenyan-lang seems to do the latter.

brynne8 commented 4 years ago

Since it's an interesting task parsing Chinese numerals, I wrote a simple one in PEG using LPeg.re.

Link: chinese_number.lua

LingDong- commented 4 years ago

Thanks for pointing out the issues! The Chinese numerals have always been the hard part.

穣 is 穰's 異體字。But I agree it should be changed to the more common form 穰.
一十二 : Should be easy to fix.
一百一: 101 was the original behavior, but changed as requested by this issue: #24 . 二百五=250 sounds more common though.
Number rendering: in fact, all the print statements translates to console.log, but on the online IDE, I hijacked/monkey-patched the console.log to print to a <div>, in which I added the feature of rendering numbers as hanzi. For arrays, technically I can traverse all the datastructre and recursively change everything to hanzi, but it creates some display issues when the output Array is very long - I'll correct for that in the next online IDE update.

Thank you!

antfu commented 4 years ago

I would propose a new approach.

How about we implemented a print function in the standard library and print numbers and others to hanzi. And by default, 書之 will call that function. This can outputs numbers to hanzi without hijack in the ide and will work everywhere. Besides, another syntax may be needed to be introduced as 記之 or something for the raw output of the target language( works as the current 書之).

I am not very good at wenyan so please feel free to make suggestions to the wording.

SaltfishAmi commented 4 years ago

* 一百一: 101 was the original behavior, but changed as requested by this issue: #24 . 二百五=250 sounds more common though.

Surely it sounds more common, but it's in spoken language. Actually too spoken. Formally, 二百五 should be 205

oovm commented 4 years ago

我这有个算法不知道有没有漏洞:

从左往右读, 每一读一位乘十加上后一位, 但如果是倍数词那得乘上相应的倍数

然后读到 <EOS> 额外检测, 如果是不是十那么乘十.

因为只有 二百五, 没有 二百五万, 只能读成 二百五十万.

这个算法好处是同时支持 一零九九 和 一千零九十九 两种读法.

一个 python 的示例实现如下:

https://github.com/GalAster/WenyanLanguage/blob/master/packages/wenyan-parser-py/source/hanzi2num.py