sdcTools / sdcMicro

sdcMicro
http://sdctools.github.io/sdcMicro/
78 stars 22 forks source link

Issue with kAnon/local suppression in sdcApp #339

Closed thijsbenschop closed 1 year ago

thijsbenschop commented 1 year ago

Hi,

I'm getting an error in sdcApp when using local suppression.

When I load testdata, select urbrur and age as categorial key variables and run K-anonimity (all default parameters), sdcApp crashes. I'm seeing the following warning message in the R console. Any ideas what causes this? Runs fine from command line.

Warning: Error in xtfrm.data.frame: cannot xtfrm data frames

Best

Thijs

R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 [2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 [4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: [1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] data.table_1.14.8 shinyBS_0.61.1 haven_2.5.2
[4] rhandsontable_0.3.8 shiny_1.7.4 sdcMicro_5.7.5

loaded via a namespace (and not attached): [1] gtable_0.3.3 xfun_0.39 bslib_0.4.2 ggplot2_3.4.2
[5] htmlwidgets_1.6.2 tzdb_0.3.0 vctrs_0.6.2 tools_4.3.0
[9] crosstalk_1.2.0 generics_0.1.3 tibble_3.2.1 fansi_1.0.4
[13] DEoptimR_1.0-13 cluster_2.1.4 pkgconfig_2.0.3 readxl_1.4.2
[17] lifecycle_1.0.3 compiler_4.3.0 textshaping_0.3.6 prettydoc_0.4.1
[21] munsell_0.5.0 data.tree_1.0.0 fontawesome_0.5.1 carData_3.0-5
[25] httpuv_1.6.9 htmltools_0.5.5 sass_0.4.6 yaml_2.3.7
[29] crayon_1.5.2 later_1.3.1 pillar_1.9.0 jquerylib_0.1.4
[33] MASS_7.3-58.4 ellipsis_0.3.2 DT_0.27 cachem_1.0.8
[37] mime_0.12 robustbase_0.95-1 tidyselect_1.2.0 digest_0.6.31
[41] stringi_1.7.12 dplyr_1.1.2 forcats_1.0.0 fastmap_1.1.1
[45] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3 utf8_1.2.3
[49] readr_2.1.4 scales_1.2.1 promises_1.2.0.1 rmarkdown_2.21
[53] httr_1.4.6 cellranger_1.1.0 ragg_1.2.5 hms_1.1.3
[57] memoise_2.0.1 evaluate_0.21 knitr_1.42 rlang_1.1.1
[61] Rcpp_1.0.10 xtable_1.8-4 glue_1.6.2 rstudioapi_0.14
[65] jsonlite_1.8.4 R6_2.5.1 systemfonts_1.0.4

thijsbenschop commented 1 year ago

Setting the importance vector seems to solve the issue. Could it have something to do with that?

bernhard-da commented 1 year ago

@thijsbenschop unfortunately, I cannot reproduce this at all (neither on a linux nor on a windows-machine) both with R 4.2.3 and the last version 4.3.0 that you were using too

> devtools::session_info()
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.0 (2023-04-21 ucrt)
 os       Windows 11 x64 (build 22621)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  German_Austria.utf8
 ctype    German_Austria.utf8
 tz       Europe/Vienna
 date     2023-06-14
 rstudio  2023.06.0+421 Mountain Hydrangea (desktop)
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 cachem          1.0.8   2023-05-01 [1] CRAN (R 4.3.0)
 callr           3.7.3   2022-11-02 [1] CRAN (R 4.3.0)
 carData         3.0-5   2022-01-06 [1] CRAN (R 4.3.0)
 cli             3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 cluster         2.1.4   2022-08-22 [2] CRAN (R 4.3.0)
 colorspace      2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 crayon          1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
 data.table      1.14.8  2023-02-17 [1] CRAN (R 4.3.0)
 DEoptimR        1.0-14  2023-06-09 [1] CRAN (R 4.3.0)
 devtools        2.4.5   2022-10-11 [1] CRAN (R 4.3.0)
 digest          0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
 dplyr           1.1.2   2023-04-20 [1] CRAN (R 4.3.0)
 DT              0.28    2023-05-18 [1] CRAN (R 4.3.0)
 ellipsis        0.3.2   2021-04-29 [1] CRAN (R 4.3.0)
 evaluate        0.21    2023-05-05 [1] CRAN (R 4.3.0)
 fansi           1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
 fastmap         1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 fs              1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
 generics        0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2         3.4.2   2023-04-03 [1] CRAN (R 4.3.0)
 glue            1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 gtable          0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools       0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
 htmlwidgets     1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
 httpuv          1.6.11  2023-05-11 [1] CRAN (R 4.3.0)
 jsonlite        1.8.5   2023-06-05 [1] CRAN (R 4.3.0)
 knitr           1.43    2023-05-25 [1] CRAN (R 4.3.0)
 later           1.3.1   2023-05-02 [1] CRAN (R 4.3.0)
 lifecycle       1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 MASS            7.3-60  2023-05-04 [1] CRAN (R 4.3.0)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.3.0)
 mime            0.12    2021-09-28 [1] CRAN (R 4.3.0)
 miniUI          0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0)
 munsell         0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar          1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgbuild        1.4.0   2022-11-27 [1] CRAN (R 4.3.0)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 pkgload         1.3.2   2022-11-16 [1] CRAN (R 4.3.0)
 prettydoc       0.4.1   2021-01-10 [1] CRAN (R 4.3.0)
 prettyunits     1.1.1   2020-01-24 [1] CRAN (R 4.3.0)
 processx        3.8.1   2023-04-18 [1] CRAN (R 4.3.0)
 profvis         0.3.8   2023-05-02 [1] CRAN (R 4.3.0)
 promises        1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0)
 ps              1.7.5   2023-04-18 [1] CRAN (R 4.3.0)
 purrr           1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
 R6              2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 Rcpp            1.0.10  2023-01-22 [1] CRAN (R 4.3.0)
 remotes         2.4.2   2021-11-30 [1] CRAN (R 4.3.0)
 rhandsontable   0.3.8   2021-05-27 [1] CRAN (R 4.3.0)
 rlang           1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown       2.22    2023-06-01 [1] CRAN (R 4.3.0)
 robustbase      0.95-1  2023-03-29 [1] CRAN (R 4.3.0)
 rstudioapi      0.14    2022-08-22 [1] CRAN (R 4.3.0)
 scales          1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
 sdcMicro      * 5.7.5   2023-01-09 [1] CRAN (R 4.3.0)
 sessioninfo     1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 shiny           1.7.4   2022-12-15 [1] CRAN (R 4.3.0)
 shinyBS         0.61.1  2022-04-17 [1] CRAN (R 4.3.0)
 stringi         1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr         1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 tibble          3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyselect      1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 urlchecker      1.0.1   2021-11-30 [1] CRAN (R 4.3.0)
 usethis         2.2.0   2023-06-06 [1] CRAN (R 4.3.0)
 utf8            1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
 vctrs           0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
 xfun            0.39    2023-04-20 [1] CRAN (R 4.3.0)
 xtable          1.8-4   2019-04-21 [1] CRAN (R 4.3.0)

 [1] C:/Users/xy/AppData/Local/R/win-library/4.3
 [2] C:/Program Files/R/R-4.3.0/library

this is the code that runs in sdcApp() if you just load the testdata, select urbrur and age as cat-vars and run kAnon() without any further settings

rm(list = ls())
library(sdcMicro)

# read data
data(testdata)
inputdata <- readMicrodata(
  path = "testdata",
  type = "rdf",
  convertCharToFac = FALSE,
  drop_all_missings = FALSE
)

# create factors
inputdata <- varToFactor(obj = inputdata, var = "urbrur")
inputdata <- varToFactor(obj = inputdata, var = "age")

# create object
sdcObj <- createSdcObj(
  dat = inputdata,
  keyVars = c("urbrur", "age"),
  seed = 0,
  randomizeRecords = FALSE,
  alpha = c(1)
)

# run k-anon
sdcObj <- kAnon(
  obj = sdcObj,
  importance = c(1, 2),
  combs = NULL,
  k = c(3)
); sdcObj

can you reproduce any error/warning running this code alone?

thijsbenschop commented 1 year ago

Hi @bernhard-da, thanks for checking.

I can run the code above in R without any error/warning. In the console, I cannot reproduce the error, only in sdcApp. Also, if I specify the importance vector manually in sdcApp, I don't get an error.

Last week 20 participants of a training installed R, RStudio and sdcMicro for the first time on their computers (different versions of Windows, latest R version) and all of them got the same error message. I'll try to dig a little deeper to understand where it goes wrong.

It seems linked to sdcApp and the way the order of the variables for local suppression is determined by default when not specifying the order of importance manually.

bernhard-da commented 1 year ago

@thijsbenschop please do two things;

1: send me the output of devtools::session_info() on a pc where you are able to reproduce the problem 2: export (in sdcApp) the problem-instance before hitting the button to apply k-anon;

thijsbenschop commented 1 year ago

@bernhard-da

1) ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.3.0 (2023-04-21 ucrt) os Windows 10 x64 (build 19044) system x86_64, mingw32 ui RStudio language (EN) collate English_United States.utf8 ctype English_United States.utf8 tz America/New_York date 2023-06-27 rstudio 2023.03.0+386 Cherry Blossom (desktop) pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package version date (UTC) lib source bslib 0.4.2 2022-12-16 [1] CRAN (R 4.3.0) cachem 1.0.8 2023-05-01 [1] CRAN (R 4.3.0) callr 3.7.3 2022-11-02 [1] CRAN (R 4.3.0) carData 3.0-5 2022-01-06 [1] CRAN (R 4.3.0) cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) cluster 2.1.4 2022-08-22 [2] CRAN (R 4.3.0) colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.3.0) data.table 1.14.8 2023-02-17 [1] CRAN (R 4.3.0) DEoptimR 1.0-13 2023-05-02 [1] CRAN (R 4.3.0) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.0) digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0) dplyr 1.1.2 2023-04-20 [1] CRAN (R 4.3.0) DT 0.27 2023-01-17 [1] CRAN (R 4.3.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) fontawesome 0.5.1 2023-04-18 [1] CRAN (R 4.3.0) forcats 1.0.0 2023-01-29 [1] CRAN (R 4.3.0) fs 1.6.2 2023-04-25 [1] CRAN (R 4.3.0) generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) ggplot2 3.4.2 2023-04-03 [1] CRAN (R 4.3.0) glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0) haven 2.5.2 2023-02-28 [1] CRAN (R 4.3.0) highr 0.10 2022-12-22 [1] CRAN (R 4.3.0) hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0) htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0) httpuv 1.6.9 2023-02-14 [1] CRAN (R 4.3.0) jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.3.0) jsonlite 1.8.4 2022-12-06 [1] CRAN (R 4.3.0) knitr 1.42 2023-01-25 [1] CRAN (R 4.3.0) later 1.3.1 2023-05-02 [1] CRAN (R 4.3.0) lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) MASS 7.3-58.4 2023-03-07 [2] CRAN (R 4.3.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.0) mime 0.12 2021-09-28 [1] CRAN (R 4.3.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0) pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.3.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) pkgload 1.3.2 2022-11-16 [1] CRAN (R 4.3.0) prettydoc 0.4.1 2021-01-10 [1] CRAN (R 4.3.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.3.0) processx 3.8.1 2023-04-18 [1] CRAN (R 4.3.0) profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.0) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.3.0) ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0) purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) ragg 1.2.5 2023-01-12 [1] CRAN (R 4.3.0) Rcpp 1.0.10 2023-01-22 [1] CRAN (R 4.3.0) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.3.0) rhandsontable 0.3.8 2021-05-27 [1] CRAN (R 4.3.0) rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0) robustbase 0.95-1 2023-03-29 [1] CRAN (R 4.3.0) rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0) sass 0.4.6 2023-05-03 [1] CRAN (R 4.3.0) scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0) sdcMicro 5.7.5 2023-01-09 [1] CRAN (R 4.3.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) shiny 1.7.4 2022-12-15 [1] CRAN (R 4.3.0) shinyBS * 0.61.1 2022-04-17 [1] CRAN (R 4.3.0) stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0) stringr 1.5.0 2022-12-02 [1] CRAN (R 4.3.0) systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.3.0) textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.3.0) tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.0) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.3.0) utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0) xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.0) yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)

[1] C:/Users/wb460271/AppData/Local/R/win-library/4.3 [2] C:/Program Files/R/R-4.3.0/library

2) see here

The error in the console: image

bernhard-da commented 1 year ago

@thijsbenschop thx, I was able to reproduce and fixed it in https://github.com/sdcTools/sdcMicro/commit/d595f90a915bcce1c6a7eb3dc36e7207e28911cc

Fix will then be in the next CRAN release.

thijsbenschop commented 1 year ago

@bernhard-da great, thanks for the quick fix for this issue