Open betterftr opened 11 months ago
Thank you for letting me know! CAME seems to be interesting.
I think we can use any optimizer with optimizer_type
and optmizer_args
arguments, like --optimizer_type=came_pytorch.CAME --optimizer_args "weight_decay=1e-2" "betas=(0.9, 0.999, 0.9999)" "eps=(1e-30, 1e-16)"
.
I'd like to check this optimizer if its not too hard to implement, it should be less mem usage than Adamw8bit https://github.com/yangluo7/CAME