Closed TomasAndersonFang closed 3 years ago
Hi Tomas,
The preprocessing of the training data (including BPE) is the same in both cases (typically your file will contain one sentence per line).
There's two different ways that you can count the batch size: count tokens or count lines (sentences). Defining batch size in terms of tokens is becoming more popular because it makes slightly better use of available GPU memory; how you can configure your NMT model to use token-level batch sizes depends on your NMT toolkit. A random example: in Nematus, --batch_size
is sentence-level, --token_batch_size
token-level.
Thank you very much. I think I know how to achieve it~
Hello, thank you for your work!
I have a confusion about how to use BPE. For example, I follow your tutorial and generate corresponding train data, validation data, and test data. Then I use <src, trg> pair (batch_size=64) to train my model. But in some papers, e.g. "Attention is all you need", they use Tokens (batch_size=2048) to train their model. Could you tell me how to achieve it and the difference between these two ways?