This isn't really an issue per se, but I found that if you wrap the entire call grad_cache.GradCache(...) in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call to self.compute_loss() in autocast context manager within training_step by default.
Maybe you can add a note about this gotcha in the readme?
This isn't really an issue per se, but I found that if you wrap the entire call
grad_cache.GradCache(...)
in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call toself.compute_loss()
in autocast context manager withintraining_step
by default.Maybe you can add a note about this gotcha in the readme?