Having read the NeuralPlexer paper not too long ago, one of the things that stuck out to me about it is how vast the authors' pretraining routine for this model was. Do you have any ideas for how one might replicate this pretraining scheme, preferably by reusing existing code repositories?
Hi, @lucidrains.
Having read the NeuralPlexer paper not too long ago, one of the things that stuck out to me about it is how vast the authors' pretraining routine for this model was. Do you have any ideas for how one might replicate this pretraining scheme, preferably by reusing existing code repositories?