lucidrains / En-transformer

Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-Equivariant Graph Neural Network
MIT License
208 stars 28 forks source link

Performance drop with checkpointing update #9

Open heiidii opened 2 years ago

heiidii commented 2 years ago

I see a drop in performance (higher loss) when I update checkpointing from checkpoint_sequential(self.layers, 1, inp) to checkpoint_sequential(self.layers, len(self.layers), inp). Is this expected?