Closed brando90 closed 2 years ago
@LysandreJik can you help me ping the right person for this issues?
The summary is:
Hi @brando90, transformers
is meant as a library of model architectures more than a library of optimizers, and we're actively moving away from maintaining optimizers. We'd rather you rely on a library that actively maintain them as the support should be both broader (not tested only on transformers
, like it is here) and more complete (not limited to the two optimizers that we support here).
Some that come to mind are pytorch-optimizer or Fairseq.
@LysandreJik thank you! I will try that! That comment would be useful in the docs :)
I will close the issue with closing remarks of the solution I ended up using. Appreciate the response.
@LysandreJik I was reading the adafactor scheduler and it seems that it multiplies the lr by 0 which seems odd to me:
https://github.com/huggingface/transformers/blob/master/src/transformers/optimization.py#L604, https://huggingface.co/docs/transformers/master/main_classes/optimizer_schedules https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.LambdaLR.html
class AdafactorSchedule(LambdaLR):
"""
Since :class:`~transformers.optimization.Adafactor` performs its own scheduling, if the training loop relies on a
scheduler (e.g., for logging), this class creates a proxy object that retrieves the current lr values from the
optimizer.
It returns ``initial_lr`` during startup and the actual ``lr`` during stepping.
"""
def __init__(self, optimizer, initial_lr=0.0):
def lr_lambda(_):
return initial_lr
for group in optimizer.param_groups:
group["initial_lr"] = initial_lr
super().__init__(optimizer, lr_lambda)
for group in optimizer.param_groups:
del group["initial_lr"]
def get_lr(self):
opt = self.optimizer
lrs = [
opt._get_lr(group, opt.state[group["params"][0]])
for group in opt.param_groups
if group["params"][0].grad is not None
]
if len(lrs) == 0:
lrs = self.base_lrs # if called before stepping
return lrs
can you help me figure out what the scheduler for adafactor is doing?
seems like the fair one ran without errors so far, other one had a bug.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Environment info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.10.3Information
Model I am using (Bert, XLNet ...):
The problem arises when using:
The tasks I am working on is:
To reproduce
I am running the MAML (with higher) meta-learning algorithm with a resnet. I see this gives issues in my script (error message pasted bellow). Is Adafactor not suppose to work with Resnets or other models?
Steps to reproduce the behavior:
Expected behavior
I expect training to go smoothly but isntead get:
full error output:
related: