rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

apply_bpe.py repeats last character twice (if not EOL symbol) #38

Closed maksymbevza closed 6 years ago

maksymbevza commented 6 years ago

To reproduce:

$ echo -n hello world | python apply_bpe.py --codes bpe_codes.txt
hel@@ lo worldd

I know that having EOL before EOF is a good thing and everyone better do it, but no one is protected from it.

rsennrich commented 6 years ago

thank you for reporting this; fixed now.

maksymbevza commented 6 years ago

I have commented on a potential bug here... https://github.com/rsennrich/subword-nmt/commit/1d4c3ca

maksymbevza commented 6 years ago

Thanks for fixing!