Avg_entropy in tensorboard

yangchaozhao commented 6 years ago

Hi, hongzi: I'm confused about the actual meaning of the avg_entropy in tensorboard. I found that you just computed the entropy of the lowest bit rate choice(action_prob[0]), right? So what does it mean? Hope to get your reply!

hongzimao commented 6 years ago

action_prob[0] returns a vector and a3c.compute_entropy literally computes the entropy (https://en.wikipedia.org/wiki/Entropy#Information_theory).

In our purpose, we use it to control the exploration. Please refer to section 4.2 (especially the term H() in equation 4) and section 4.4 (detailed implementation) of our paper, as well as reference [30], for more details.

Also, notice that in sim/README.md, we stated

As reported by the A3C paper (http://proceedings.mlr.press/v48/mniha16.pdf) and a faithful implementation (https://openreview.net/pdf?id=Hk3mPK5gg), we also found the exploration factor in the actor network quite crucial for achieving good performance. A general strategy to train our system is to first set ENTROPY_WEIGHT in a3c.py to be a large value (in the scale of 1 to 5) in the beginning, then gradually reduce the value to 0.1 (after at least 100,000 iterations).

yangchaozhao commented 6 years ago

Thank you for your reply! I see, but the question is, why did you compute the entropy using "action_prob[0]" instead of "action_prob"? I think action_prob[0] is just a scalar, which stands for the probability of choosing the lowest bit rate, and we need the all 6 probabilities to compute the entropy, isn't it?

hongzimao commented 6 years ago

If I remember it correctly, action_prob is a 2D tensor and action_prob[0] is a vector. You should double check by printing it. Though It might be a typo that we had during the development. I admit it is not the most comprehensive way to implement this and apologize for the confusion.

yangchaozhao commented 6 years ago

Yes, my mistake. Actually, action_prob[0] is a vector. Thank you for your help!

hongzimao / pensieve

Avg_entropy in tensorboard #15