microsoft / FQF

FQF(Fully parameterized Quantile Function for distributional reinforcement learning) is a general reinforcement learning framework for Atari games, which can learn to play Atari games automatically by predicting return distribution in the form of a fully parameterized quantile function.
Other
40 stars 10 forks source link

stale gradients problem #3

Open ddlau opened 3 years ago

ddlau commented 3 years ago

If I didn't get it wrong, there might be a subtle problem in applying gradients to FPN's trainable variables.

When optimizing FPN, the application of gradients w.r.t. FPN's trainable variables is separated into 2 stages: first dW1 (from the 1-Wasserstein loss) and then the entropy. After the first optimization, the trainables would have changed. What I mean is: entropy is calculated based on the old trainables but applied to the new trainables. I'm not sure, but is this the so-called stale gradients problem?

Hope to respond

ddlau commented 3 years ago

concrete location: file FQF\dopamine\agents\fqf\fqf_agent.py, line 417~424