rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

Meaning of the output file of learn_bpe.py #84

Closed Gromy1211 closed 4 years ago

Gromy1211 commented 4 years ago

Every line of the output file of learn.py is consisted of two subword units(for instance, o f</w>)

I am a little bit confused by these two units, does that mean o and f</w> can be merged into of<w> when applying BPE?

Thanks for answering!

rsennrich commented 4 years ago

yes, this is correct. More specifically, the file lists all merge operations in the order in which they will be applied. To apply the merge operations to a new file, use apply_bpe.py.

Gromy1211 commented 4 years ago

Thanks for the quick response!! I think I understand how the BPE algorithm works now ;)