lovecambi / UnsupervisedMT-TensorFlow

Unsupervised Machine Translation (Transformer Based UNMT)
BSD 2-Clause "Simplified" License
9 stars 2 forks source link

non-parallel chinese - english #1

Open ramdhan1989 opened 4 years ago

ramdhan1989 commented 4 years ago

I am still new in NLP. I have data non-parallel chinese - english. Can I use this repository ? how to train the model using windows ?

lovecambi commented 4 years ago

I did not test it on windows. If your Chinese and English data are non-parallel but in similar domain, I think you can try this code.

ramdhan1989 commented 4 years ago

Currently, I am waiting on training since it took very long time (still on training fasttext). I am using the data from get_data_enfr.sh. what does "similar domain" mean ? if I have non-parallel data traditional chinese and english and they are taken from advertisement title of online shop, are they considered as "similar domain" ? I have the data in csv files, where should I start to train the model? I need to make model to translete from traditional chinese to english. Is there any data pre-processing need to be performed in order to suit this repo ?

please advise thank you