entroy coeffieicent problem

microsoft / FQF

FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.

Other

40 stars 10 forks source link

entroy coeffieicent problem #4

Open ddlau opened 3 years ago

ddlau commented 3 years ago

If I didn't get it wrong, there might be a subtle problem in applying gradients to FPN's trainable variables.

the entropy coefficient, 0.001 or fqf_ent or self.ent in the code, applied twice.

first at fqf_agent.py, line 399, via a magic number 0.001：

q_entropy = tf.reduce_sum(-quantile_tau * tf.log(quantile_tau), axis=1) * 0.001

then at line 419 the same file, applied twice, via self.ent：

self.optimizer1.minimize(self.ent * tf.reduce_mean(-q_entropy), var_list=fqf_params), \