Closed iTsingalis closed 2 years ago
Hello, thanks for opening an issue! We try to keep the github issues for bugs/feature requests. Could you ask your question on the forum instead?
Thanks!
cc @sgugger
Sorry for my misplaced post. I think the problem is solved. To be honest I just re-entered the modifications on the original code more carefully and now it seems to be working. You can delete my post or move it to the forum if you find it more appropriate. Sorry for the inconvenience again.
System Info
torch 1.12.1+cu113 transformers 4.23.0.dev0
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Hi, I want to use AdaHessian optimizer in text-classification example run_glue_no_trainer.py. To do so I have modified the part of the code where the optimizer is selected. That is, instead of this
and this,
I am using this
and this
respectively. The AdaHessian is given here.
Expected behavior
Normally, it should continue training but
is returned by
in the optimizer's function
The error is returned because the
grads
created by the listparam
do not contain a_grad_fun
. I suspect that the problem is related to the input of the optimizer (e.g. loss in the backward function). According to this post I have tried for examplebefore backwards in the closure which make the script to start running but the accuracy is around 45% and does not improve. Could you please take a look at the problem and make a suggestion to overcome it?
EDIT:
I just noticed in the trace back that
lr_scheduler
is mentioned before the error intorch.autograd.grad
.I was suspecting that something with the
_grad_fn
is happening in accelerator. Thus, commentingmakes the optimization procedure to start running which indicates that
_grad_fn
are disabled somehow inside accelerator. Could you please some one suggest a way to overcome this problem?