Create a Normalizer that lives alongside the Dataset (not in it; it should be an explicit and equal-class citizen).
First round impl should have an interface similar to _DeltaNormalizer in dynamics.py (but delta normalization shouldn't be included, only acs and obs -- delta normalization is specific to dataset).
Second round impl should add popart, which not only requires normalization but should also have an adjust_scale(predictor) method which renormalizes the last layer of a predictor's NN
Third round impl can introduce some notion of quantile clipping
One approach I discussed with Tuomas that might be considered "fair" and encourage some robustness is something like soft clipping all dimensions, where a soft clip is determined by a "soft" max of all prior observations, but the max can only go up by 10% of the range seen so far for that coordinate. This is just my ad-libbed pseudo-doubling scheme, and while this is a minor point I wanted to ask you if there's an established way of accomplishing this same thing.
Create a Normalizer that lives alongside the Dataset (not in it; it should be an explicit and equal-class citizen).
First round impl should have an interface similar to _DeltaNormalizer in dynamics.py (but delta normalization shouldn't be included, only acs and obs -- delta normalization is specific to dataset).
Second round impl should add popart, which not only requires normalization but should also have an adjust_scale(predictor) method which renormalizes the last layer of a predictor's NN
Third round impl can introduce some notion of quantile clipping