Closed leogaube closed 1 year ago
Advantage Actor Critic (A2C) might be interesting to look into aswell. This adds an advantage function to the algorithm. I'm not quite sure yet what exactly it does, but during my research I encountered it pretty often.
Can look into when I'm done with the monte carlo search tree.
this issue has been droped
Our Actor-Critic learns bad policies. A Soft Actor Critic does not only maximize rewards, but also entropy (randomness of actions). The hope would be that SAC can help to stabilize both actor and critic.