parlr / ruby-font-creator

Generate rich Unicode open fonts with custom annotations, transliterations, pronunciations.
21 stars 3 forks source link

Data gathering #19

Closed hugolpz closed 7 years ago

hugolpz commented 7 years ago

We currently look for database with{ "glyph": "西", "phonetic": "xī" } (or xi1, or alternatives).

Sources possible, info to complete :

Moedict

CJKlib

edouard-lopez commented 7 years ago

What about Unihan?

With the hexadecimal codepoint we can get the glyph like this in Python:

>>> print(chr(int('0x897F', 16)))
西

A JS solution would be better, but this is out of the scope of the project, we can do it anyway we think fits.

hugolpz commented 7 years ago

Please check out :

screenshot from 2017-03-16 17-59-36 screenshot from 2017-03-16 17-58-24

edouard-lopez commented 7 years ago

Thanks for the link cjk-unihan might be useful for other projects.

I think it's better to limit the project to generating font and outsource the data gathering/validation to another project. This way we stay focus and efficient.

I'm closing as different users might have different needs hence handcraft their dictionaries.

edouard-lopez commented 7 years ago

I reckon the JS solution is in tobei/unihan code

const character = String.fromCodePoint(parseInt(code.substring(2), 16));
hugolpz commented 7 years ago

Did you gathered the data ?

edouard-lopez commented 7 years ago

Not yet, could you work on a project to do so?

hugolpz commented 7 years ago

Yup. See also https://github.com/peterolson/pinyinify/issues/1#issuecomment-287167463

screenshot from 2017-03-17 11-16-25

edouard-lopez commented 7 years ago

@hugolpz I think you have a typo in your comment, there is a ratio of 1:10 between node-pinyin and unihan characters/phonetic pairs. Can you confirm/correct this number?

hugolpz commented 6 years ago

https://github.com/superbiger/pinyin4js/blob/master/src/dict/pinyin.dict.js

edouard-lopez commented 6 years ago

We can get the codepoint using punycode