Closed anant10 closed 6 years ago
Hi,
you can pre-process your data files and split the charaters from words. Or you can use the tokenize-detokenize options in the config.py:
config.py
TOKENIZATION_METHOD = 'tokenize_none_char' DETOKENIZATION_METHOD = 'detokenize_none_char' APPLY_DETOKENIZATION = True
This feature may have some problem, as it is not thoroughly tested. Moreover, if you use a naive NMT model for character seq2seq, it may perform poorly. In case you want to implement an additional model, you shoud modify the model_zoo.py file.
model_zoo.py
Cheers.
Thanks lvapeab , its working but bleu scores are worse but the prediction is good enough.
Hi,
you can pre-process your data files and split the charaters from words. Or you can use the tokenize-detokenize options in the
config.py
:This feature may have some problem, as it is not thoroughly tested. Moreover, if you use a naive NMT model for character seq2seq, it may perform poorly. In case you want to implement an additional model, you shoud modify the
model_zoo.py
file.Cheers.