sdcTools / sdcMicro

sdcMicro
http://sdctools.github.io/sdcMicro/
79 stars 23 forks source link

all keyVars converted to factor #261

Closed bebru closed 6 years ago

bebru commented 6 years ago

If data is extracted from an sdc object with extractManipData, all keyVars are factors:

library(sdcMicro)
orig <- data.frame(a = LETTERS[1:5], 
                   b = letters[1:5],
                   c = 1:5,
                   d = runif(5),
                   stringsAsFactors = FALSE)
str(orig)
sdc <- createSdcObj(orig, keyVars = c("a","d"))
anon <- extractManipData(sdc)
str(anon)
# 'data.frame': 5 obs. of  4 variables:
# $ a: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
# $ b: chr  "a" "b" "c" "d" ...
# $ c: int  1 2 3 4 5
# $ d: Factor w/ 5 levels "0.133784607984126",..: 4 2 5 1 3
# even numerics

This can be prevented with ignoreKeyVars = TRUE in extractManipData, but it would be handier (and probably more expected) to receive the same type as the input variable is as default.

bernhard-da commented 6 years ago

hi @bebru , thx for spotting this. could you please test with the fix i just pushed by installing sdcMicro directly from github (devtools::install_github("sdcTools/sdcMicro"))?

Btw: option ignoreKeyVars in extractManipData() returns the input key-variables "as is" (eg. correct type), but you loose all possible modifications such as suppressions...

devtools::load_all()
orig <- data.frame(
  a = LETTERS[1:5],  
  b = factor(letters[1:5]), 
  c = 1:5, 
  d= seq(0.1,0.8, length=5), stringsAsFactors = FALSE)

sdc <- createSdcObj(orig, keyVars = c("a","b","c","d"))

## just for testing introduce some NAs (could be for example from kAnon()
sdc@manipKeyVars$a[1] <- NA
sdc@manipKeyVars$b[2] <- NA
sdc@manipKeyVars$c[3] <- NA
sdc@manipKeyVars$d[4] <- NA

## with anonymisation
str(extractManipData(sdc))
'data.frame':   5 obs. of  4 variables:
 $ a: chr  "A" NA "C" "D" ...
 $ b: Factor w/ 5 levels "a","b","c","d",..: 1 NA 3 4 5
 $ c: int  1 NA 3 4 5
 $ d: num  0.1 NA 0.45 0.625 0.8

## without anonymisation
str(extractManipData(sdc, ignoreKeyVars = TRUE))
'data.frame':   5 obs. of  4 variables:
 $ a: chr  "A" "B" "C" "D" ...
 $ b: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
 $ c: int  1 2 3 4 5
 $ d: num  0.1 0.275 0.45 0.625 0.8
bebru commented 6 years ago

Hi @bernhard-da , thanks for the (very) quick fix. Works perfectly. Also for the hint regarding ingnoreKeyVars. That was a rash suggestion of mine :-)

bernhard-da commented 6 years ago

hi @bebru thx for the testing. the fix will be included in the next cran version.