sdcTools / UserSupport

The place to be for User Support on SDC tools and to download the latest releases
https://sdctools.github.io/UserSupport/
Other
11 stars 3 forks source link

Decimal weights #140

Open MaximeBeaute opened 4 years ago

MaximeBeaute commented 4 years ago

SDC tool used: tau-argus, and sdcTable Version used: 4.1.7 for tau-argus, and 0.31 for sdcTable Operating system used: Windows


Hi ! I am from the French NSI. When protecting tabular data, our microdata often include decimal weights. In the handbook on SDC, as well as in the User's Manual of Tau Argus, the method to deal with integer weights is well explained. But there is nothing on decimal weights. Hence my questions : Is it safe to use Tau-Argus and/or sdcTable with decimal weights ? Has it been thought out, or could it be subject to side effects ? For instance, in Tau Argus, using decimal weights seems to result in incorrect calculation of safety ranges. Should I do it anyway and ignore safety ranges, or could there be problems in the calculation of primary or secondary cells as well ?

ppdewolf commented 4 years ago

@MaximeBeaute could you provide us with an example? The microdata, metadata, and the safety ranges according to your calculations? How do they differ from the ones provided by tau-argus and why are these incorrect? I presume that you mean sampling-weights?

MaximeBeaute commented 4 years ago

Yes, I mean sampling-weights. About the incorrect values though, I did a mistake : I am only talking about the audit intervals, not the safety ranges. To illustrate, let us look at the ZIP "Example non-integer weights issue.zip" including :

If I : 1/ run the batch file on the data with integer weights 2/ do secondary suppressions using Hypercube 3/ run the audit Then I have 3 partially disclosed cells. For these 3 cells, the protection interval is not included in the audit interval (which is a feasibility interval, right ?). So there seems to be no problem.

However, if I do the same using the data with non-integer weights, then I have 14 partially disclosed cells (out of 14 unsafe cells). For all of them, the lower protection bound equals the upper one (sometimes = 0). In this case, the calculation of the feasibility interval is obviously wrong, isn't it ?

On a side note, if I run secondary suppressions using Modular on the first dataset, it will work. And Optimal won't. Using the second dataset, it is the opposite : Modular makes Tau-Argus crash, and Optimal works.