Closed geyerf closed 3 years ago
In general, the p%-rule is based on the assumption of having non-negative contributions. See e.g., Statistical Disclosure Control, Hundepool et al. (2012), ISBN 978-1-119-97815-2, chapter 4. Note that in case of negative values, the question arrises what is sensitive (large positive or large negative values), what is the lower and upper bound to a cell if positive and negative contributions are allowed, etc. If only negative contributions: what is sensitive (near zero or very negative)? In the above mentioned book there is also some mention how to deal with cells that have positive as well as negative contributions (page 148).
In tau-argus the p% rule is calculated based on the non-negativity assumption. This means that in case of only negative values, the calculation does not make sense. Hence the all suppressed cells.
So, this is not a bug, but a result of an underlying generally accepted assumption when using the p%-rule.
In case you only have negative values where you consider "very negative" the most sensitive, you can just negate all contributions and calculate the p%-rule on the resulting positive contributions. In tau-argus you can do this by using a negated variable as "shadow variable" (you have to add that variable to the microdataset outside tau-argus).
Thank you very much for your thorough answer. You convinced me, however, from a users perspective it would be nice to have an error/warning message from Tau-Argus in such cases.
I understand that you would like some warning/error message. But in case of a few slightly negative contributions, you would not have a problem. So we would have to be very careful with such warning/error messages. Perhaps something like "WARNING: The p% rule assumes nonnegative contributions. Your data contains negative contributions. If there are too many, this may lead to unexpected behaviour." Clear and useful warning/error messages are difficult 😉 (Re-)considering warning/error messages in general is on the wish-list.
SDC tool used: tau-argus Version used: 4.2.0 Build 5 Operating system used: Windows
Hello all, this is my first potential bug raise. So I try my best to stick to the requested structure:
I tried to apply the p%-rule for a variable that has only strictly negative values (e.g. negative income). This resulted in suppression of all cells in tau-argus and not only the ones that fail the p%-rule. I expected that the p%-rule is applied in the same way as for positive variables. For me this seems to be a bug.
In the example provided there are two variables VAR_POS and VAR_NEG. VAR_NEG consists of the negated values of VAR_POS. Therefore, applying the p%-rule should yield to the same suppression pattern for both variables. However, for VAR_NEG all cells are suppressed.
Cheers
testarb.txt testdata.txt testrda.txt
EDIT: I have no idea how to add labels etc. :)