Closed Howuhh closed 1 year ago
Ahh thank you very much! I'll swap yours in, or it would be great if you could make a PR.
Hmm, actually I think it depends on what we mean by "update".
I believe mine is based on CleanrRL's implementation, which updates the learning rate every "PPO Update", not every "Gradient Update".
Yeah, seems like it is indeed. Sorry for the confusion. Thanks for the implementation anyway!
Hi! Noticed that the linear schedule for learning rate for a small number of steps is actually not linear. I doubt it makes much difference to the final results, but thought I'd show it anyway. Maybe I made a mistake somewhere?
Result: