Closed JeremieMelo closed 2 years ago
support implementation switches between factorized and reconstructed gradient checkpointing for memory-efficient training-mode forward function.
Looks good, thanks @JeremieMelo, merging!
support implementation switches between factorized and reconstructed gradient checkpointing for memory-efficient training-mode forward function.