sdcTools / sdcMicro

sdcMicro
http://sdctools.github.io/sdcMicro/
78 stars 22 forks source link

localsuppression idvarextraforsls with testdata2 #64

Closed thijsbenschop closed 8 years ago

thijsbenschop commented 8 years ago

Hi,

the function localsuppression creates under some circumstances a variable called idvarextraforsls and adds this to the list of key variables. This variable enumerates all records and renders them sample uniques (100% 2-anonimity violators). This also happens when using the example in the help file of the localsuppression function:

data(testdata2)
sdc <- createSdcObj(testdata2,
keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
numVars=c('expend','income','savings'), w='sampling_weight')

sdc <- localSuppression(sdc)
head(sdc@manipKeyVars)

  urbrur roof walls water electcon relat sex idvarextraforsls
1      1    2     3    NA        4     1   1                1
2      1    2     3     3        4    NA   1                2
3      1    2     3     3        4     1  NA                3
4      1   NA     3     4        1     3   1                4
5      1    2     3    NA        4    NA   2                5
6      1    4     2    NA        1     3   1                6

print(sdc, 'ls')

urbrur ............ 3 [ 3.226 %]

roof .............. 10 [ 10.753 %]

walls ............. 7 [ 7.527 %]

water ............. 26 [ 27.957 %]

electcon .......... 0 [ 0 %]

relat ............. 14 [ 15.054 %]

sex ............... 4 [ 4.301 %]

idvarextraforsls .. NA [ NA %]
bernhard-da commented 8 years ago

Hi, thanks for pointing out this possible problem. I just can't reproduce with current code from github.

Could you please install the latest sdcMicro version from github

require(devtools)
install_github("alexkowa/sdcMicro")

and try your example again because I can't reproduce it.

thx!

thijsbenschop commented 8 years ago

Hi,

thank you for the quick response. Indeed, after installing the newer version, the problem does not occur. We ran this code on several computers with different OS with last release of sdcMicro and R and created the variable idvarextraforsls. This also happened with other datasets. Should this happen again with the new version, I will send this here. In general, when is this variable created?

Thanks

bernhard-da commented 8 years ago

yes, this was probably a bug that slipped into the last cran version. this variable is internally created when calculating sample frequencies for the keys and/or local suppression. however, it should never be exposed to the user.

this will be corrected in the next cran release.

thijsbenschop commented 8 years ago

Thank you for looking into that and fixing this.