Documentation about autocast

This isn't really an issue per se, but I found that if you wrap the entire call grad_cache.GradCache(...) in torch autocast, you will run into weird errors. This happens by default in the huggingface trainer, which wraps the call to self.compute_loss() in autocast context manager within training_step by default.

Maybe you can add a note about this gotcha in the readme?

luyug / GradCache

Documentation about autocast #17