FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.
If I didn't get it wrong, there might be a subtle problem in applying gradients to FPN's trainable variables.
the entropy coefficient, 0.001 or fqf_ent or self.ent in the code, applied twice.
first at fqf_agent.py, line 399, via a magic number 0.001:
then at line 419 the same file, applied twice, via self.ent: