Closed ElMuto closed 7 years ago
Hi,
thanks for this message.
Can you please play around with the argument alpha (n the App this parameter can be set between 0 and 1 when setting up the SDC problem). In command line, it looks then
localSuppression(sdcObj, k=3, importance = NULL, combs = NULL, alpha = 0)
Its all about how you count frequencies with missing values. With default alpha = 1, row 5 and 6 fulfils even 5-anonymity while with alpha = 0, this is not the case. Even more details you can found here http://www.springer.com/de/book/9783319502700
it's fully implemented thus issue closed.
Thank you very much for your response. I understand now how k-anonymity is calculated in sdcMicro.
Is there a way to instruct sdcMicro to treat missing values like an own category (as described in section 3.2.2 of the reference you sent me)?
My apologies for asking questions like this here - I'll be happy to switch to another chanel, if you prefer.
if I'm not wrong, this is with alpha=0
also, you can recode the missings (NA
) into something different.
Thanks again for your response.
I am not sure if setting alpha=0 will produce the desired result.
To make my point more clear, I created another example which is based on section 4.2.2.1 of the textbook you have recommended. What I'm trying to achieve is to produce the result that is presented for "Method 5 (own category)" in Table 4.2. For your convenience, I restructured the code so that the example is self-contained:
require(sdcMicro)
Region <- c("A","A","A","A","A")
Status <- c("Single","Married","Married","Single","Widow")
Age_group <- c("30-49","30-49","30-49","30-49","30-49")
dataset <- data.frame(Region,Status,Age_group)
# Works
sdc <- createSdcObj(dataset, keyVars=c('Region', 'Status', 'Age_group'), alpha=1)
sdc = localSuppression(sdc, k=3, importance = NULL, combs = NULL)
print(sdc, "kAnon")
# Loops (foever?)
sdc <- createSdcObj(dataset, keyVars=c('Region', 'Status', 'Age_group'), alpha=0)
sdc = localSuppression(sdc, k=3, importance = NULL, combs = NULL)
print(sdc, "kAnon")
Unfortunately, I'm not able to verify my assumption, since localSuppression() seems not to terminate when using alpha<>1.
Cannot work when alpha = 0
Even when all values in other variables set to NA, you still do not fulfil k-anonymity as soon you interpret Widow as own category.
So even here you dont have k-anonymity for alpha = 0. This is easy to see:
Region Status Age_group 1 A NA 30-49 2 A NA 30-49 3 A NA 30-49 4 A NA 30-49 5 A Widow 30-49
but we'll probably provide a "fix" or at least a better solution
oh, I see, but this would be a solution
Region Status Age_group 1 A NA 30-49 2 A NA 30-49 3 A NA 30-49 4 A NA 30-49 5 A NA 30-49
will come on the todo list. I expect that for real world data this is not much the case, but it should be solved in any case.
Hello matthias-da and bernhard-da,
I really appreciate your support with this issue! I have just been able to identify a working example that supports my assumption (that alpha=0 would not result in sdcMicro appling Method 5 (own category)). It is taken from Table 4.1 in the same book. AFAICS, it is basically identical to the last example, except for the value of k, which is 2 here (and, of course, alpha, wich is set to 0, as suggested).
require(sdcMicro)
Region <- c("A","A","A","A","A")
Status <- c("Single","Married","Married","Single","Widow")
Age_group <- c("30-49","30-49","30-49","30-49","30-49")
dataset <- data.frame(Region,Status,Age_group)
sdc <- createSdcObj(dataset, keyVars=c('Region', 'Status', 'Age_group'),
alpha=0)
sdc = localSuppression(sdc, k=2, importance = NULL, combs = NULL)
print(sdc, "kAnon")
extractManipData(sdc)
In this example, the anonymized data
Region Status Age_group
1 A Single 30-49
2 A Married 30-49
3 A Married 30-49
4 A Single 30-49
5 A <NA> 30-49
is not 2-anonymous, if <NA> is considered as an own category. The result rather suggests that using alpha=0 leads to sdcMicro using Method 2 (conservative) in Table 4.1.
If my assumption is correct it would be great to know, if there is a way to achieve k-anonymity using localSuppression() while missing values are treated as an own category.
Kind regards
Hi, I'm sorry to not even can look anymore on the tables and about how we implemented it and what was our philosophy behind. I only know that the table is very safe in case the attacker do not know what NA can be (that I think is the case in practice). I'm on holiday now until August. Best,
hi @ElMuto, thx for the example. i just pushed an update to the next-branch. could be please verify that everything works as expected.
hi @bernhard-da, I just tested it with a couple of datasets. The execution time issues are definitely resolved. k-anonymity in some small datasets seems to work as expected. In bigger datasets there seem to be a small fraction of datasets violating k-anonymity (in my case: k=5). However, afaics at the moment, thats's fine for me. But if you're interested in the test data, please just give me a notice.
Anyways: thanks a l o t for your help !
hi @ElMuto, thx for confirming.
it would be nice if you could give us a problem instance in which k/5-anonymity was not achieved.
Using this input
and this R code
, I get this result
Although k is set to 3 in the R code above, rows 5 and 6 in the resulting dataset form an equivalence class of size 2. Therefore the resulting dataset is only 2-anonymous (similar behaviour with k=4, k=5, etc).
What am I doing wrong?