Open sianvolta opened 3 years ago
Marian has no built-in option for this, but I think you can prepare the batches yourself and guide Marian to just consume them by disabling data shuffling with --no-shuffle
, disabling batches generation with --maxi-batch-sort none
and specifying size of your batches with --mini-batch <NUMBER>
.
You can also generate batches on the fly as Marian can read training data from STDIN (read more about this there: https://groups.google.com/g/marian-nmt/c/zSb7MT4kZ6M). If you use training from STDIN, consider re-defining an epoch as a specific number of batches by --logical-epoch <NUMBER>u
.
Is there a way to alternate between training corpora for different batches?
E.g. I've two set of files:
And for every batch it alternates between
train-1.src train-1.trg
ortrain-2.src train-2.trg
.