libpinyin / libpinyin

Library to deal with pinyin.
GNU General Public License v3.0
441 stars 105 forks source link

Incorporating the pyim-greatdict dictionary #88

Open ChengCat opened 6 years ago

ChengCat commented 6 years ago

Is it possible to incorporate the pyim-greatdict dictionary found here? It looks a good dictionary with over 3 million words. Thanks.

epico commented 6 years ago

Actually we can only merge dictionary compatible with GPLv3+ or ASL 2.0 license.

As our code is GPLv3+ license.

Please provide license information in English, thanks!

ChengCat commented 6 years ago

Thanks for the reply.

I am not associated with pyim-greatdict, and can't provide additional materials.

I just happen to know both libpinyin and pyim-greatdict, and see a opportunity for improvement. I hope the improvement can happen in some form. I am not a lawyer, but I think the dictionary could be distributed under a different license than GPL, as long as the dictionary itself can be distributed legally. As a more conservative approach, it should at least be possible to give user an instruction how to make use of pyim-greatdict with libpinyin.

epico commented 6 years ago

Actually ibus-libpinyin can import third party dictionary in setup dialog. The dictionary format is described in the setup dialog.

Maybe you could ask pyim-greatdict to release one file with format compatible with ibus-libpinyin.

ChengCat commented 6 years ago

I can't find the dictionary format description in the UI. Could you give me more hint? I am using libpinyin via fcitx-libpinyin. I can only see a "Manage Pinyin Dictionary" window without any format description.

epico commented 6 years ago

Dictionary File Format: Each line contains one of the following: "phrase pinyin" or "phrase pinyin frequency" like "你好 ni'hao" or "你好 ni'hao 5".

The above is format for ibus-libpinyin, not sure whether fcitx-libpinyin use the same format...

ChengCat commented 6 years ago

In addition to pyim-greatdict, I've found some other resources that could be potentially useful to libpinyin. https://github.com/crownpku/awesome-chinese-nlp#corpus-%E4%B8%AD%E6%96%87%E8%AF%AD%E6%96%99