Open L-hongbin opened 6 months ago
in the optimizer/init.py,get_param_groups function Why use the "wd_mult" key instead of the "weight_decay"?Will the "wd_mult" parameter take effect in the optimizer?
in the optimizer/init.py,get_param_groups function Why use the "wd_mult" key instead of the "weight_decay"?Will the "wd_mult" parameter take effect in the optimizer?