Closed oaarnikoivu closed 3 years ago
you closed this issue, so you might have found the answer yourself, but in case anybody else finds this question:
the training part of BPE (learn_bpe) is only determined by the most frequent pair of symbols at any given time (and alphabetical order to resolve ties), so it doesn't matter if you run it on the original text or a dataset that consists of 5 copies of the original text.
Hi,
I'm trying to implement BPE dropout using the tecnique you mention in the README, by creating an augmented training dataset by concatenating the original training (5K sentences) dataset multiple times, and then applying BPE dropout on this. I'm just wondering do I have to apply the "learn BPE" method on the concatenated dataset or does it suffice to learn BPE on the original 5K dataset, and then to simply apply BPE with the dropout probability on the concatenated dataset using the vocabulary learned on the original dataset?