Closed xixiddd closed 5 years ago
The BPE model is trained using 30,000 operations using the target side of the training data according to the line that you pointed to. The source/target vocabularies for the encoder-decoder model consist of 30,000 most frequent subwords (or BPE segmented tokens) from the source/target sides of the parallel data (see line) .
Hi, Shamil Chollampatt. In the Model and Training Details Section of your paper, you said that "Each of the source and target vocabularies consists of 30K most frequent BPE tokens from the source and target side of the parallel data, respectively.", but according to this line in the preprocessing scripts(i.e. training/preprocess.sh), it seems that you only use the target-end data to learn BPE codes and then, apply it to both source and target data.