Open redknightlois opened 5 years ago
Thanks for the post! I'm a bit unsure what you are asking. Are you asking that others or I try this out, or merge this code in? Or were you asking for feedback?
Also, if you have example plots for the performance of this on specific environments, it would help.
I don't do research, there is probably a lot of things to do to achieve something worth publishing. Just letting you know that it shows promising result in my limited trials. So this is kind of an observation
I see. Thanks for sharing! Would you mind posting your results here?
On Tue, Jun 11, 2019, 4:25 PM Federico Andres Lois notifications@github.com wrote:
I don't do research, there is probably a lot of things to do to achieve something worth publishing. Just letting you know that it shows promising result in my limited trials. So this is kind of an observation
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vitchyr/rlkit/issues/53?email_source=notifications&email_token=AAJ4VZMFYRBBOV46NOCCDGTP2AX7XA5CNFSM4HJEARI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXOZJ2A#issuecomment-501060840, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ4VZJQEABC6HCHLHPI6SDP2AX7XANCNFSM4HJEARIQ .
Sorry, I would really like but I am under NDA for this stuff. What I can say (which is general enough) is that even though the source data is very difficult to make it converge using general methods (in the supervised case too); with super convergence effects I was able to steer the policy quite rapidly (in the same way I am able to do on the supervised case). I am training supervised neural networks in under 100 minutes what it took multiple days 6 months ago to the same accuracy.
I have been using these two routines to figure out the best learning rate to apply with awesome results on SAC. However, the changes in the
temperature
alter those values along the way. Probably would be a good idea to extend it further to do some sort of 'automatic' discovery of LR afterx
amount of epochs. This version will also mess up the gradients, so you cannot use the policy after you run this.