jc-bao / policy-adaptation-survey

This repository is for comparing the prevailing adaptive control method in both control and learning communities.
Apache License 2.0
7 stars 1 forks source link

Performance drop during the training... #1

Closed jc-bao closed 1 year ago

jc-bao commented 1 year ago

The rewards went down slowly during the training.

Overall training metric Zoom in rewards decay
image image

Something wired to notice: a. the actor-critic loss never goes down. b. gradually reward decay.

  1. Is it normal? (Please compare with other library performances)
  2. How to solve it?
jc-bao commented 1 year ago

The wired thing mainly happened in actor_std, which is not converging anymore. Originally, the learning rate is set to 1e-3, and the data reuse is 32, which is too high for the on-policy algorithm. Change the data reuse (repeat times) to < 10 and choose a smaller learning rate (1e-4) to solve the problem.

Also, if you noticed the action std is not converging, please consider choosing a smaller entropy regularization lambda value.

Here are the new results, which I think resolved is The wired thing mainly happened in actor_std, which is not converging anymore. Originally, the learning rate is set to 1e-3, and the data reuse is 32, which is too high for the on-policy algorithm. Change the data reuse (repeat times) to < 10 and choose a smaller learning rate (1e-4) to solve the problem.

Also, if you noticed the action std is not converging, please consider choosing a smaller entropy regularization lambda value.

Here are the new results, which I think resolved the issue.

image

No file chosen Attach files by dragging & dropping, selecting or pasting them.sue.