tarexo / informaticup-profit

The solution of the informaticup challenge 2023
MIT License
5 stars 0 forks source link

Implement Soft Actor Critic (SAC) #46

Closed leogaube closed 1 year ago

leogaube commented 1 year ago

Our Actor-Critic learns bad policies. A Soft Actor Critic does not only maximize rewards, but also entropy (randomness of actions). The hope would be that SAC can help to stabilize both actor and critic.

tarexo commented 1 year ago

Advantage Actor Critic (A2C) might be interesting to look into aswell. This adds an advantage function to the algorithm. I'm not quite sure yet what exactly it does, but during my research I encountered it pretty often.

Can look into when I'm done with the monte carlo search tree.

leogaube commented 1 year ago

this issue has been droped