lvapeab / nmt-keras

Neural Machine Translation with Keras
http://nmt-keras.readthedocs.io
MIT License
532 stars 130 forks source link

How to build a character sequence to sequence model? #58

Closed anant10 closed 6 years ago

lvapeab commented 6 years ago

Hi,

you can pre-process your data files and split the charaters from words. Or you can use the tokenize-detokenize options in the config.py:

TOKENIZATION_METHOD = 'tokenize_none_char'
DETOKENIZATION_METHOD = 'detokenize_none_char'
APPLY_DETOKENIZATION = True

This feature may have some problem, as it is not thoroughly tested. Moreover, if you use a naive NMT model for character seq2seq, it may perform poorly. In case you want to implement an additional model, you shoud modify the model_zoo.py file.

Cheers.

anant10 commented 6 years ago

Thanks lvapeab , its working but bleu scores are worse but the prediction is good enough.