Closed matthias-da closed 4 years ago
P.S. this also happens when addNoise() or microaggregation are directly applied on the sdcMicroObj object. Also here the risk does not update.
P.S. it works for other data sets. But so far, nothing seems to be wrong with eusilcP. The numeric key variable is of class numeric, thus it should also work for this data set.
Another example: (also does not update the risks)
sdc2 <- createSdcObj(eusilc13puf,
keyVars = c("age","rb090","pl031","pb220a"),
numVars = "pgrossIncome",
pramVars = "db040", # this is region
hhId = "db030",
weightVar = "weight"
)
sdc2 <- microaggregation(sdc2)
print(sdc2, "numrisk")
sdc2 <- addNoise(sdc2, noise = 50000)
print(sdc2, "numrisk")
# also no update of the risk
# try
sdc2 <- calcRisks(sdc2)
print(sdc2, "numrisk")
# neither
P.S.: eusilc13puf was used from package simPop
@matthias-da
in dRisk.R
(starting from line 75) we actually want to compute a the risk measure as the ratio of observations for which at least one original value of numerical key variables x
lies within intervals mi
and ma
that are computed using perturbed values xm
. if an observation is not perturbed "enough", we consider it risky. the relevant code is
w <- which(rowSums(x < mi | x > ma, na.rm = TRUE) %in% 0:1)
as.numeric(length(w)/nrow(x))
which in my opinion is wrong and would be required to be changed to:
w <- rowSums(x < mi | x > ma, na.rm = TRUE) == 0
as.numeric(sum(w) / nrow(x))
so basically it is not that the risks are not updating but that they are (wrongly) computed. what do you think?
Definitely. Thank you. I wonder why we got reasonable results in the past with it.
It should be just
w <- rowSums(x < mi | x > ma, na.rm = TRUE) == 0
mean(w)
Already uploaded it. Thanks again.
Fixed thanks to Bernhard.
By preparing a course, I just noted that all methods for numeric key variables gives wrong estimates of the risk (i.e. risk is not changing, but should). This is new, previously this always worked correctly.
--> Nothing changed in reported risk, also in
sdc@risk$numeric
nothing changed, even we added a HUGE noise.11128.45 11128.45 19694.85 19694.85 5066.24 31480.01
-51243.79 -14486.73 326680.17 33581.38 11616.27 571382.56
Thus numeric risk should be low now, but still it reports 100% (1).
In case anybody would find time for this, this would be great.