sdcTools / sdcMicro

sdcMicro
http://sdctools.github.io/sdcMicro/
79 stars 23 forks source link

Microaggregation /MDAV / strata_variables #267

Closed superjeje closed 6 years ago

superjeje commented 6 years ago

Hello

when trying to microaggregate using the MDAV method and using the strata_variables , I can see no effect on the results wether with or without the strata_variables provided.

Is it a bug or where do i get wrong?

the version is : 5.3.0 executed on Windows 8 / Rstudio Version 1.1.442

here s the exact command line SDC_LS_MAGG <- microaggregation(SDC_LS,variables = SDC_nV,strata_variables = SDC_magg_stV , method='mdav',measure='median', aggr=7);

Here s the full script

---------------------------------INITIALIZE PARAMETERS

SDC_kV = c("kV_1","kV_2","kV_3","kV_4"); SDC_kV = gsub(" ","", SDC_kV); SDC_sV = NULL; SDC_nV =c("nV_1","nV_2"); SDC_nV = gsub(" ", "", SDC_nV); SDC_gV = NULL; SDC_stV = NULL; SDC_pV = NULL; SDC_wV = NULL; SDC_eV = NULL; SDC_magg_stV = c("kV_1","kV_2"); SDC_magg_stV = gsub(" ","", SDC_magg_stV );

---------------------------------CREATE THE SDC OBJECT FROM A DATAFRAME

SDC <- createSdcObj(DF_SDC,keyVars=SDC_kV,numVars = SDC_nV,excludeVars =SDC_eV,sensibleVar = SDC_sV,ghostVars = SDC_gV ,strataVar = SDC_stV,pramVars =SDC_pV ,weightVar = SDC_wV);

---------------------------------PRINT THE SDC OBJECT

SDC;

---------------------------------LOCAL SUPPRESSION K=7 AND IMPORTANCE SET

SDC_importance=c(1,2,3,4) SDC_LS <- localSuppression(SDC, k = 7,importance=SDC_importance);

---------------------------------PRINT THE SDC_LS OBJECT

print(SDC_LS, "ls");

---------------------------------MICROAGGREGATION AGG=7 MEASURE=MEDIAN METHOD=MDAV

SDC_LS_MAGG <- microaggregation(SDC_LS,variables = SDC_nV,strata_variables = SDC_magg_stV , method='mdav',measure='median', aggr=7);

---------------------------------WRITE A CSV FILE - READABLE FOR TESTING PURPOSE

datafile_out = "SDC_LS_MAGG.csv" path_datafile_out=paste(paste(getwd(),"SDC/microagg/work",sep='/'),datafile_out,sep='/') writeSafeFile(obj=SDC_LS_MAGG, format="csv", randomizeRecords="no", sep="^", dec=".", col.names=TRUE, row.names=FALSE, quote = FALSE, fileOut=path_datafile_out);

here the description of the initial datafile str(DF_SDC) 'data.frame': 330 obs. of 6 variables: $ kV_1: chr "C" "C" "C" "C" ... $ kV_2: chr "B" NA "F" NA ... $ kV_3: chr "E" "D" "B" "D" ... $ kV_4: chr "C" "H" "H" "G" ... $ nV_1: num 17 19 14 18 16 16 15 15 14 17 ... $ nV_2: num 0 34 0 33 0 35 0 0 39 0 ...

the input file (DF_SDC) is attached DF_SDC.zip the output file (SDC_LS_MAGG) is attached SDC_LS_MAGG.zip

bernhard-da commented 6 years ago

@superjeje the variable(s) you are trying to use as strataVars (SDC_stV) are set to NULL. so you are not using any strata-variables. you can easily check with get.sdcMicroObj(SDC, "strataVar") which in your example also returns NULL

superjeje commented 6 years ago

Thank you @bernhard-da

yes the the SDC_stV is set to null (SDC_stV = NULL; )

but
not the used parameter SDC_magg_stV which is set with kV_1+kV_2 (SDC_magg_stV = c("kV_1","kV_2") line before "###---------------------------------CREATE THE SDC OBJECT FROM A DATAFRAME" )

SDC_LS_MAGG <- microaggregation(SDC_LS,variables = SDC_nV,strata_variables = SDC_magg_stV , method='mdav',measure='median', aggr=7);

thank you for your reply

bernhard-da commented 6 years ago

hi @superjeje thx for spotting this. indeed, the argument is ignored in case you are using a sdcMicroObj as input in microaggregation(). I just pushed a change that explicitely tells you that this argument is ignored. if you want to use stratification, just use strataVar(object) <- some_vars before the call and set it to NULL afterwards.

superjeje commented 6 years ago

Thank you

superjeje commented 6 years ago

Here i come again, i m not sure to understand clearly as the strata_variables is given as an parameter in a microaggregation example in the SDC book of Matthias templ ( p124) 2018-09-19_14-06-30

thank you for your help.

best regards

bernhard-da commented 6 years ago

yeah, this argument was just (silently) ignored if the input was a sdcMicroObj. In case you fed a data.frame to microaggregation(), this argument is of course used. as for the example in the book: if the sdc-object has slot strataVar set, this variable(s) were used, if not, no stratification would be applied even if specified. in the next version of sdcMicro, we will explicitly give the user a message about this behaviour.