Popart is a method that enables networks to scale across different reward magnitudes by literally rescaling the value output weights to adjust to changes in reward magnitude as it's observed in the environment.
Adding popart is a bit nuanced, because it requires a moderately involved implementation. I recommend the following steps:
Create a RewardRenormalizable interface class in memory/ which admits an adjust_scale parameter. It's up to subclasses to implement this in a semantically meaningful way; see the paper examples on how to rescale the final output layers of a neural network for details in the case of a value network that's being a RewardRenormalizable instance.
Add a method to the existing memory.normalization.Normalizer class like subscribe_to_reward_rescale, which adds the caller (who passes a point to its self) to an internal list.
Upon statistics updates (which should probably be online) to the Normalizer, invoke the adjust_scale scale method with the appropriate statistics for every subscriber in the internal list.
Popart is a method that enables networks to scale across different reward magnitudes by literally rescaling the value output weights to adjust to changes in reward magnitude as it's observed in the environment.
See this paper for details.
Adding popart is a bit nuanced, because it requires a moderately involved implementation. I recommend the following steps:
memory/
which admits anadjust_scale
parameter. It's up to subclasses to implement this in a semantically meaningful way; see the paper examples on how to rescale the final output layers of a neural network for details in the case of a value network that's being a RewardRenormalizable instance.memory.normalization.Normalizer
class likesubscribe_to_reward_rescale
, which adds the caller (who passes a point to itsself
) to an internal list.Normalizer
, invoke theadjust_scale
scale method with the appropriate statistics for every subscriber in the internal list.