Closed yangchaozhao closed 6 years ago
action_prob[0]
returns a vector and a3c.compute_entropy
literally computes the entropy (https://en.wikipedia.org/wiki/Entropy#Information_theory).
In our purpose, we use it to control the exploration. Please refer to section 4.2 (especially the term H() in equation 4) and section 4.4 (detailed implementation) of our paper, as well as reference [30], for more details.
Also, notice that in sim/README.md
, we stated
As reported by the A3C paper (http://proceedings.mlr.press/v48/mniha16.pdf) and a faithful implementation (https://openreview.net/pdf?id=Hk3mPK5gg), we also found the exploration factor in the actor network quite crucial for achieving good performance. A general strategy to train our system is to first set ENTROPY_WEIGHT in a3c.py to be a large value (in the scale of 1 to 5) in the beginning, then gradually reduce the value to 0.1 (after at least 100,000 iterations).
Thank you for your reply! I see, but the question is, why did you compute the entropy using "action_prob[0]" instead of "action_prob"? I think action_prob[0] is just a scalar, which stands for the probability of choosing the lowest bit rate, and we need the all 6 probabilities to compute the entropy, isn't it?
If I remember it correctly, action_prob
is a 2D tensor and action_prob[0]
is a vector. You should double check by printing it. Though It might be a typo that we had during the development. I admit it is not the most comprehensive way to implement this and apologize for the confusion.
Yes, my mistake. Actually, action_prob[0]
is a vector. Thank you for your help!
Hi, hongzi: I'm confused about the actual meaning of the avg_entropy in tensorboard. I found that you just computed the entropy of the lowest bit rate choice(action_prob[0]), right? So what does it mean? Hope to get your reply!