Open bghira opened 13 hours ago
The gradient checkpointing enables self.training and then the log_validations run with this logic as well, unnecessarily.
I found that I have to disable this when running even under torch.no_grad(), looked and saw that the official examples do not do this either.
torch.no_grad()
Add print statements to the checkpointing function.
No response
-
@linoytsaban @yiyixuxu
this impacted every model I looked at while implementing layer-skipping for checkpointing.
Describe the bug
The gradient checkpointing enables self.training and then the log_validations run with this logic as well, unnecessarily.
I found that I have to disable this when running even under
torch.no_grad()
, looked and saw that the official examples do not do this either.Reproduction
Add print statements to the checkpointing function.
Logs
No response
System Info
-
Who can help?
@linoytsaban @yiyixuxu