while using the additive noise option in the addNoise function, I came up with two question:
The default is 150, which seems to be very high. Using the default, the data completely loses it structure. Would a smaller default value for noise prevent users from destroying the data or wondering about the results?
The literature on additive noise specifies the magnitude of noise as alpha, which is the multiplier for the variance of the original data (see e.g. http://crises-deim.urv.cat/webCrises/publications/isijcr/lncs3050OntheSec.pdf). It seems that the addNoise algorithm uses alpha to multiply the standard deviation of the original data rather than the variance to determine the standard deviation of the noise (see code below). The disadvantage is that in that case the variance of the perturbed data isn't (1+alpha)(variance of original data), but instead (1+alpha^2)(variance of original data). Is this the correct interpretation of the noise argument in addNoise? If so, would it be better to use alpha as a multiplier for the variance, rather than for the standard deviation to avoid confusion?
Thanks!
# Test additive noise
library(sdcMicro)
set.seed(2352)
dataIllus <- cbind(c(rnorm(1000, 1, 0.5)), c(rnorm(1000, 1, 0.5)))
dataNoise <- addNoise(dataIllus) # default value 150
var(dataNoise$x) # variance as expected 0.25
var(dataNoise$xm) # variance not (1 + alpha) * variance, 150 seems very large
dataNoise2 <- addNoise(dataIllus, noise = 0.5)
var(dataNoise2$x) #
var(dataNoise2$xm) #
# Seems alpha to sd and not var, in literature var
var(dataNoise2$xm)[1,1] / var(dataNoise2$x)[1,1] # 1 + alpha^2
sqrt(var(dataNoise2$xm)[1,1]) / sqrt(var(dataNoise2$x)[1,1])
var(dataNoise2$xm)[2,2] / var(dataNoise2$x)[2,2] # 1 + alpha^2
sqrt(var(dataNoise2$xm)[2,2]) / sqrt(var(dataNoise2$x)[2,2])
for the additive-noise method we changed
x + rnorm(N, 0, noise * sd(x, na.rm=TRUE))
to
x + rnorm(N, 0, noise/100 * sd(x, na.rm=TRUE))
where x is the numeric data vector.
Hi,
while using the additive noise option in the addNoise function, I came up with two question:
Thanks!