Closed lalalune closed 3 months ago
This PR removes grokfast, which was causing NaNs, and gradient checkpointing, which optimizes memory but isn't necessary.
This PR removes grokfast, which was causing NaNs, and gradient checkpointing, which optimizes memory but isn't necessary.