Open gunnxx opened 11 months ago
Hi,
in principle, because there is a loss for predicting reward from the task state, to minimize the total loss the model should make the right attribution. I suggest trying a higher loss coefficient for the reward to further encourage the task model to capture reward-correlated information.
Hi, thanks for the great paper and nice code!
I am wondering whether you have encountered the case where the distractor variable $s_t^-$ reconstructs the agent and the task variable $s_t^+$ reconstructs the background? That solution is non-optimal under the given objective but apparently the network can't escape this local optima.