Closed bachfab closed 6 years ago
Also the importance=...
argument does not correctly reflect the priorities assigned in the GUI under Anonymity -> k-anonymity. This is under the assumption that the order of prio values in the importance
vector should reflect the order of key variables in the keyVars
vector.
hi @bachfab
thx for reporting. i fixed the typo but can't reproduce the remaining issue. for me it states as expected:
sdcObj <- kAnon(sdcObj, importance=c(1,4,5,3,6,2), combs=c(4), k=c(10))
you mentioned you were using some dummy test-data. could you export the problem instance just before trying to establish k-anonymity in Reproducibility
-> Export/Save the current sdcProblem
and link to this file somewhere?
as to your second comment. if you're not changing the importance directly, we prefer to suppress values in the key variable with the most characteristics (highest importance) to the lowest and ignore the "order" of their occurence in the data set.
Hi Bernhard,
Yes I said "dummy" – but actually it was real microdata, just taken some 100 records instead of a full dataset. I'm checking with Aleksandra now how to produce an example case… Meanwhile 2 more things:
Thank you for sharing the snapshot – I confirm it works well now!
I realized that writeSafeFile used with format="sas" produces .sas7bdat files that give an error when trying to open them with SAS EPG (and when trying to use them as input to another SAS program). I also confirm the problem's already there with the write_sas function from the haven lib apparently used by sdcMicro (and Google indicates it's already known for haven), hence no idea if you want to make this one of your own issues…
All the best, Fabian
From: bernhard-da [mailto:notifications@github.com] Sent: Monday, December 18, 2017 9:16 PM To: sdcTools/UserSupport Cc: BACH Fabian (ESTAT); Mention Subject: Re: [sdcTools/UserSupport] sdcMicroGUI settings not carried over correctly to R script (#81)
hi @bachfabhttps://github.com/bachfab
thx for reporting. i fixed the typo but can't reproduce the remaining issue. for me it states as expected:
sdcObj <- kAnon(sdcObj, importance=c(1,4,5,3,6,2), combs=c(4), k=c(10))
you mentioned you were using some dummy test-data. could you export the problem instance just before trying to establish k-anonymity in Reproducibility -> Export/Save the current sdcProblem and link to this file somewhere?
as to your second comment. if you're not changing the importance directly, we prefer to suppress values in the key variable with the most characteristics (highest importance) to the lowest and ignore the "order" of their occurence in the data set.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/sdcTools/UserSupport/issues/81#issuecomment-352545356, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AflOwUfu6xXAjFLp6NHhDrO-tv2WhCSFks5tBsgEgaJpZM4RFlLK.
@bachfab ok, you could try to reproduce the problem with some kind of dataset you create (eg using random numbers) and which you can easily share
as for the second issue: no, this is needs to be fixed in haven::write_sas
As announced a minute ago:
DummyTestProblemForBernhard
Best, Fabian
From: bernhard-da [mailto:notifications@github.com] Sent: Monday, December 18, 2017 9:16 PM To: sdcTools/UserSupport Cc: BACH Fabian (ESTAT); Mention Subject: Re: [sdcTools/UserSupport] sdcMicroGUI settings not carried over correctly to R script (#81)
hi @bachfabhttps://github.com/bachfab
thx for reporting. i fixed the typo but can't reproduce the remaining issue. for me it states as expected:
sdcObj <- kAnon(sdcObj, importance=c(1,4,5,3,6,2), combs=c(4), k=c(10))
you mentioned you were using some dummy test-data. could you export the problem instance just before trying to establish k-anonymity in Reproducibility -> Export/Save the current sdcProblem and link to this file somewhere?
as to your second comment. if you're not changing the importance directly, we prefer to suppress values in the key variable with the most characteristics (highest importance) to the lowest and ignore the "order" of their occurence in the data set.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/sdcTools/UserSupport/issues/81#issuecomment-352545356, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AflOwUfu6xXAjFLp6NHhDrO-tv2WhCSFks5tBsgEgaJpZM4RFlLK.
@bachfab thx for providing the inputs. i've identified and fixed the issue in the next-branch. the problem did only occur however, if >= 10 cat. key variables have been specified.
Please specify
SDC tool used: sdcMicroGUI Version used: 5.0.5 Operating system used: Windows
Inside the GUI, when I do
Anonymize -> Apply k-anonymity to subsets of key variables? -> Yes -> Apply k-anonimity to all subsets of 4 key variables? -> Yes -> threshold: 10
the following line is exported to the R script under "Reproducibility":
sdcObj <- kAnon(sdcObj, importance=c(7,12,1,3,13,2,11,5,9,10,6,4,8), combs=c(8), k=c(10))
However, if I understood the method and
kAnon
arguments correctly, it should readcombs=c(4)
.(I loaded 100 test records with 13 key variables.)
Btw., there is a typo in the GUI tab "Anonymize", namely in the particular setting used above: when selecting "Apply k-anonymity to subsets of key variables?" -> Yes, in the subsequent expanded lines it should read k-anonymity instead of k-anonimity.