sdcTools / sdcMicro

sdcMicro
http://sdctools.github.io/sdcMicro/
79 stars 22 forks source link

variable group size #311

Open pabuta88 opened 3 years ago

pabuta88 commented 3 years ago

First of all thanks a lot for your amazing and extensive package. I was wondering whether there is already a way to allow for variable group size when doing microaggregation? If not - are you planning on implementing it?

matthias-da commented 3 years ago

Thank you, this is an interesting comment and suggestion. However, may I ask what is the rationale behind such an approach? Typically we have a disclosure scenario and depending on that we set a fixed value for the group size, i.e. a fixed group size is motivated from a "k-anonymity" view. Are there matching scenarios where different group sizes are essential? I wonder if you may argue with outliers and homogeneity of groups? If I would see a need from an application/case then we surely want to implement it, otherwise, we would stay with the fixed group size only.

pabuta88 commented 3 years ago

Thank you very much for your quick response. Of course! My rationale is that in the company I am working for (a large state-owned enterprise) the data protection rules define a minimum group size rather than a fixed size. In my humble experience and my application, I do not see a reason why to use a fixed size (except for computational reasons) and it seems to me in a way arbitrary. I might end up assigning an observation to a group simply because the most similar group already reached its fixed size. Is this comprehensible?