luyug / GradCache

Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint
Apache License 2.0
327 stars 19 forks source link

Documentation about autocast #17

Open jxmorris12 opened 1 year ago

jxmorris12 commented 1 year ago

This isn't really an issue per se, but I found that if you wrap the entire call grad_cache.GradCache(...) in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call to self.compute_loss() in autocast context manager within training_step by default.

Maybe you can add a note about this gotcha in the readme?