rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

How to decode BPE when applied to machine translation #115

Closed ShaoDonCui closed 2 years ago

ShaoDonCui commented 2 years ago

When BPE is applied to machine translation, it can be encoded using a vocabulary.txt. But when in the prediction stage, how do convert the numbers to text? Thank you.

rsennrich commented 2 years ago

subword-nmt doesn't include functionality to map text to a sequence of integers and back, and leaves this to the sequence-to-sequence toolkit (which might also reserve some IDs for EOS, MASK, UNK etc.).

If you know the text-to-integer mapping that is used by your sequence-to-sequence toolkit, just reverse it to get back text.