Open heiidii opened 2 years ago
I see a drop in performance (higher loss) when I update checkpointing from checkpoint_sequential(self.layers, 1, inp) to checkpoint_sequential(self.layers, len(self.layers), inp). Is this expected?
I see a drop in performance (higher loss) when I update checkpointing from checkpoint_sequential(self.layers, 1, inp) to checkpoint_sequential(self.layers, len(self.layers), inp). Is this expected?