Closed blazerye closed 3 years ago
It is not recommended to use RL training with Transformer.
In fact, during the inference, Transformer has to use much memory because of its architecture. In order to train, instead of teacher forcing by MLE, RL uses sampling, which is the same as inference. Moreover, sampling in training requires back-propagation, which needs much more memory.
Thanks.
how to fix it? thank you