rsennrich / subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
MIT License
2.18k stars 464 forks source link

About how to use BPE in NMT. #98

Closed TomasAndersonFang closed 3 years ago

TomasAndersonFang commented 3 years ago

Hello, thank you for your work!

I have a confusion about how to use BPE. For example, I follow your tutorial and generate corresponding train data, validation data, and test data. Then I use <src, trg> pair (batch_size=64) to train my model. But in some papers, e.g. "Attention is all you need", they use Tokens (batch_size=2048) to train their model. Could you tell me how to achieve it and the difference between these two ways?

rsennrich commented 3 years ago

Hi Tomas,

The preprocessing of the training data (including BPE) is the same in both cases (typically your file will contain one sentence per line).

There's two different ways that you can count the batch size: count tokens or count lines (sentences). Defining batch size in terms of tokens is becoming more popular because it makes slightly better use of available GPU memory; how you can configure your NMT model to use token-level batch sizes depends on your NMT toolkit. A random example: in Nematus, --batch_size is sentence-level, --token_batch_size token-level.

TomasAndersonFang commented 3 years ago

Thank you very much. I think I know how to achieve it~