i3thuan5 / tai5-uan5_gian5-gi2_kang1-ku7

臺灣言語工具
https://i3thuan5.github.io/tai5-uan5_gian5-gi2_kang1-ku7
Other
109 stars 32 forks source link

Type error #567

Open orbxball opened 6 years ago

orbxball commented 6 years ago

What I want to do is given a sentence "臺語語言來講古", and translating it into "tai5-gi2 gi2-gian5 lai5 kong7-koo2" via this tool. It's similar in the example you gave in the passage 用拼音轉臺羅拼音 at the link

However, when I started following the instructions here about 語句轉物件 It should be

>>> from 臺灣言語工具.解析整理.拆文分析器 import 拆文分析器
>>> 
>>> 拆文分析器.建立章物件('臺語工具')  # 全漢字
章:[句:[集:[組:[詞:[字:臺 ], 詞:[字:語 ], 詞:[字:工 ], 詞:[字:具 ]]]]]

However, what I got

>>> from 臺灣言語工具.解析整理.拆文分析器 import 拆文分析器

>>> 拆文分析器.建立章物件('臺語工具')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-b093e3b8c8b4> in <module>()
----> 1 拆文分析器.建立章物件('臺語工具')

TypeError: 建立章物件() missing 1 required positional argument: '語句'

Can you give me an example of how to translate a sentence such as "臺語語言來講古" into "tai5-gi2 gi2-gian5 lai5 kong7-koo2" via this tool? Thank you!

sih4sing5hong5 commented 6 years ago

You maybe installed the old version.

We recommand using docker version. The docs of 基本物件 and 常見情境 were updated today. We provide an example for translations between Taiwanese Hanji and Taiwanese Lô-má-jī in the 查辭典、斷詞、補漢字、補羅馬字 section on the 常見情境 page.

orbxball commented 6 years ago

我已經看完最新的 instruction page,也用了 docker version。

由於我們需要的只是要把「漢字」轉成對應的「臺羅拼音」,所以是否方便提供「辭典」跟「語言模型」的 pre-trained 檔案,才能照著查辭典、斷詞、補漢字、補羅馬字那個 section 提供的方法快速做轉換,我們拿來做研究的 data 只有國字 e.g. 獅 子 也 會 表 演 啊

或是有其他更方便的方法或是工具可以作轉換,煩請告知了,謝謝!

sih4sing5hong5 commented 6 years ago

台語ê漢字系統kah華語ê漢字系統bô-kâng,看恁ê例應該是講華語漢字

準做恁想欲ài台語羅馬字ê結果

辭典ê方式較適合台語漢字kah羅馬字之間轉換 台語華語是bô-kâng ê語言,所以無法度用一字一字辭典ê方式 建議用機器翻譯ê技術

阮有提供Mosesê介面 若有興趣,docker內有語料,恁會當改用seq2seq試,效果可能會較好