Combining Gradient Caching with Gradient Accumulation/Checkpointing

luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint

Apache License 2.0

345 stars 19 forks source link

Combining Gradient Caching with Gradient Accumulation/Checkpointing #20

Open aaprasad opened 1 year ago

aaprasad commented 1 year ago

Thank you for the amazing package! I was wondering if its possible to combine gradient caching with gradient accumulation and/or gradient checkpointing and if it is possible whether it even makes sense to do so. If you could provide an example of combining them in torch that would be a huge help!