jieba tokenizer - Githubissues

twairball / fairseq-zh-en

NMT for chinese-english using fairseq

210 stars 49 forks source link

Closed lmtoan closed 5 years ago

lmtoan commented 5 years ago

Hello! Appreciate your work on this.

In the preprocess/process.py, you mentioned using Jieba for tokenizing -zh words but I don't see it implemented there. Could you help clarify?

twairball commented 5 years ago

its done in preprocess/tokenizer.py