inoryy / tensorflow2-deep-reinforcement-learning

Code accompanying the blog post "Deep Reinforcement Learning with TensorFlow 2.1"
http://inoryy.com/post/tensorflow2-deep-reinforcement-learning/
MIT License
207 stars 50 forks source link

The member value_c and entropy_c in A2CAgent #8

Closed RuralHunter closed 4 years ago

RuralHunter commented 4 years ago

I can see the comment of the 2 member values: coefficients are used for the loss terms. I can see they are used when calculating the loss values. What's the purpose of the 2 values and how they are set? The blog article seems didn't mention them.

inoryy commented 4 years ago

Hello, They are scaling coefficients and can be treated as hyperparameters. Value is often set to 0.5 to match with MSE loss derivative. Entropy should be low enough to only slightly nudge policy in the uniform direction, but not interfere with it.

RuralHunter commented 4 years ago

Thanks for the explanation and I have a rough understanding now. Is there any recommended documentation about them? Looks they are not widely used and It's the first time I see someone mentions them.

inoryy commented 4 years ago

I don't know of a resource where it's explicitly described. Hyperparameter choice is often more art than science, usually people pick what others have in the past as a baseline and iterate over them with a sweep or even just manual perturbations.

RuralHunter commented 4 years ago

OK, thanks all the same!