Currently, nnetar scales the input series by dividing by the largest absolute value
# Scale data
scale <- max(abs(xx),na.rm=TRUE)
xx <- xx/scale
and xreg is not currently being scaled (as mentioned in #205) but I'm planning on adding that.
I was wondering if you had a preference at how the scaling is done.
Looking into this, I've come across some references that suggest, for numerical conditioning, either standardizing the inputs or scaling them to [-1,1], and argue against [0,1].
There is a common misconception that the inputs to a multilayer perceptron must be in the interval [0,1]. There is in fact no such requirement, although there often are benefits to standardizing the inputs as discussed below. But it is better to have the input values centered around zero, so scaling the inputs to the interval [0,1] is usually a bad choice.
In general, any shift of the average input away from zero will bias the updates in a particular direction and thus slow down learning. Therefore, it is good to shift the inputs so that the average over the training set is close to zero [...] Convergence in faster not only if the inputs are shifted as described above but also if they are scaled so that all have about the same covariances.
Although these are somewhat old references, I found them when linked in more recent stackoverflow and stackexchange questions.
In contrast, Venables and Ripley (2002), the reference provided in the nnet package, seems to argue towards a [0,1] scaling on page 245 when describing the use of weight decay for regularization:
Weight decay, specific to neural networks, uses as penalty the sum of squares of the weights wij . (This only makes sense if the inputs are rescaled to range about [0, 1] to be comparable with the outputs of internal units.)
and they also scale inputs to [0,1] in one of their examples.
I'm leaning towards standardizing the inputs, and also modifying the scaling of the original series for consistency (perhaps with an optional argument in the nnetar call for whether to center/scale?). In my (limited) experience, this performs well including when weight decay is used.
Any thoughts on how it should be implemented? just scale by the maximum like the current code, or standardizing, [0,1], [-1,1] scaling?
Currently,
nnetar
scales the input series by dividing by the largest absolute valueand
xreg
is not currently being scaled (as mentioned in #205) but I'm planning on adding that.I was wondering if you had a preference at how the scaling is done.
Looking into this, I've come across some references that suggest, for numerical conditioning, either standardizing the inputs or scaling them to
[-1,1]
, and argue against[0,1]
.From here and here:
and from here
Although these are somewhat old references, I found them when linked in more recent stackoverflow and stackexchange questions.
In contrast, Venables and Ripley (2002), the reference provided in the
nnet
package, seems to argue towards a[0,1]
scaling on page 245 when describing the use of weight decay for regularization:and they also scale inputs to
[0,1]
in one of their examples.I'm leaning towards standardizing the inputs, and also modifying the scaling of the original series for consistency (perhaps with an optional argument in the
nnetar
call for whether to center/scale?). In my (limited) experience, this performs well including when weight decay is used.Any thoughts on how it should be implemented? just scale by the maximum like the current code, or standardizing,
[0,1]
,[-1,1]
scaling?