Performance drop during the training...

The wired thing mainly happened in actor_std, which is not converging anymore. Originally, the learning rate is set to 1e-3, and the data reuse is 32, which is too high for the on-policy algorithm. Change the data reuse (repeat times) to < 10 and choose a smaller learning rate (1e-4) to solve the problem.

Also, if you noticed the action std is not converging, please consider choosing a smaller entropy regularization lambda value.

Here are the new results, which I think resolved is The wired thing mainly happened in actor_std, which is not converging anymore. Originally, the learning rate is set to 1e-3, and the data reuse is 32, which is too high for the on-policy algorithm. Change the data reuse (repeat times) to < 10 and choose a smaller learning rate (1e-4) to solve the problem.

Also, if you noticed the action std is not converging, please consider choosing a smaller entropy regularization lambda value.

Here are the new results, which I think resolved the issue.

No file chosen Attach files by dragging & dropping, selecting or pasting them.sue.

jc-bao / policy-adaptation-survey

Performance drop during the training... #1