Add Support for GradCache in PyTorch Lightning for Multi-GPU and Mixed-Precision Training

luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint

Apache License 2.0

327 stars 19 forks source link

Add Support for GradCache in PyTorch Lightning for Multi-GPU and Mixed-Precision Training #27

Closed yang-su2000 closed 3 months ago

yang-su2000 commented 4 months ago

I developed an integration for GradCache with PyTorch Lightning, supporting Multi-GPU and Mixed-Precision training. The readme.md can be used as a reference to run the example code.

The integration of GradCache with PyTorch Lightning is very non-trivial and requires quite a few changes to the original framework. I looked into this as one of my research projects relies heavily on PyTorch Lightning and memory-efficient contrastive training, and I hope that this can serve as a valuable reference for future users and researchers.

I'm open to making any necessary adjustments to this PR to align with the standards. I also extend my gratitude to the original authors of GradCache for their foundational work. Your feedback for improvement are welcome and appreciated!

luyug commented 4 months ago

Thanks for the PR! I will try to take a look at it this week and provide comments.

yang-su2000 commented 4 months ago

Thank you, great that it could help!

luyug commented 3 months ago

Generally LGTM! Thanks again for the PR.

Not super necessary but have you considered sub-classing the original GradCache class to reduce code duplication (in the name of software engineering best practice) ? If this is too convoluted / too much work, we can pull the code as is.

nit: pl_gradcache.py:360 the nullcontext probably is not necessary

yang-su2000 commented 3 months ago

Thanks for the advice!

Not super necessary but have you considered sub-classing the original GradCache class to reduce code duplication (in the name of software engineering best practice) ? If this is too convoluted / too much work, we can pull the code as is.

Good suggestion, I did some work to make PLGradCache a subclass of GradCache now, and they passed sanity tests.

nit: pl_gradcache.py:360 the nullcontext probably is not necessary

I suppose you mean line 306, yes, it is no longer needed and has been removed.

luyug commented 3 months ago

Nice! If you don't feel like there's anything else to add, I'll go ahead and merge.

yang-su2000 commented 3 months ago

Yep, I think it is good to merge!