Hi and thanks for sharing the code.
I've tried to run the training process on a different environment such as the BipedalWalkerHardcore-v2 but it seems that is not able to learn anything. I even tried with different shift values as noted in the code comments but still in the end I get a negative reward. Should we train for longer or there any hyperparams that we are missing?
Hi and thanks for sharing the code. I've tried to run the training process on a different environment such as the
BipedalWalkerHardcore-v2
but it seems that is not able to learn anything. I even tried with differentshift
values as noted in the code comments but still in the end I get a negative reward. Should we train for longer or there any hyperparams that we are missing?