rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

How to recover the BPE #80

Closed QAQ-v closed 4 years ago

QAQ-v commented 4 years ago

Hi, thanks for your great work!

I am a beginner in NLP, I am confused that after translation I get the prediction results, but how to recover it? i.e., from

"how long do we t@@ o@@ l@@ er@@ at@@ e it?"

to

"how long are we going to tolerate it?"

I saw there are some issues have the same question but they were closed without an answer. Maybe this is a silly question. Thanks for your patience!

tnq177 commented 4 years ago

simply replacing all "@@ " with "". The readme has a command for that:

The original segmentation can be restored with a simple replacement:

sed -r 's/(@@ )|(@@ ?$)//g'
QAQ-v commented 4 years ago

simply replacing all "@@ " with "". The readme has a command for that:

The original segmentation can be restored with a simple replacement:

sed -r 's/(@@ )|(@@ ?$)//g'

Thanks! I got it when I submitted the issue =.=.. It's a really silly question..