WSD Scheduler should calculate stable steps in trainer.py. And if num_warmup_steps is provided in kwargs, schedule_func should respect the kwargs.
My guess is that the intention is it to decay till min and stay there till the end of training, but min_lr_ratio is set to the default of 0, wouldn't the learning rate be always 0? Would like to have some insights on this if possible.
I want to run WSD scheduler for my training, but I do not want to have to calculate the stable steps.
Your contribution
I can contribute to this, but I would like to better understand the edge cases or possible scenarios I might have missed out from the maintainers. However, here is my current workaround:
def get_wsd_schedule(
+ num_training_steps: int = 0,
):
...
assert num_stable_steps or num_training_steps, "One of either stable steps or training steps must be provided"
if not num_stable_steps:
num_stable_steps = num_training_steps - num_warmup_steps - num_decay_steps
if name == SchedulerType.WARMUP_STABLE_DECAY:
return schedule_func(optimizer, num_warmup_steps=num_warmup_steps,num_training_steps=num_training_steps, **scheduler_specific_kwargs)
Feature request
WSD Scheduler should calculate stable steps in
trainer.py
. And if num_warmup_steps is provided in kwargs, schedule_func should respect the kwargs.My guess is that the intention is it to decay till min and stay there till the end of training, but
min_lr_ratio
is set to the default of 0, wouldn't the learning rate be always 0? Would like to have some insights on this if possible.Additionally, trying to pass in
num_warmup_steps
inlr_scheduler_kwargs
will result in duplicate keys:Motivation
I want to run WSD scheduler for my training, but I do not want to have to calculate the stable steps.
Your contribution
I can contribute to this, but I would like to better understand the edge cases or possible scenarios I might have missed out from the maintainers. However, here is my current workaround: