Open kamille-hand opened 3 years ago
Why send the features to the TransformerModel and get the next word instead of geting the whole sentence in parallel like Bert?
Why send the features to the TransformerModel and get the next word instead of geting the whole sentence in parallel like Bert?