matthiasgomolka / sdcLog

Tools for statistical disclosure control in research data centres
GNU Affero General Public License v3.0
3 stars 3 forks source link

Incorporate result calculation and output control #71

Open matthiasgomolka opened 3 years ago

matthiasgomolka commented 3 years ago

Would it be useful to enhance the functionality of sdc_descriptives() such that the function calculates and checks a result in a single step?

Right now, users have to calculate results and then need to show that these results are fine.

This might be comparatively difficult to program as it needs to be really flexible.

tbecker2511 commented 3 years ago

@matthiasgomolka I remember we thought about it in the first place, but I don't know exactly why we decided not to do it. It could work in a similar way as sdc_extreme(). If the data resp. the descriptive statistics comply with the rules, then (grouped) descriptive statistics could be output automatically. It could work like the function summary() and output e.g. mean, median, sd, quartiles.

I think we had seen the problem in that descriptive statistics might be calculated too customized. Therefore the function could not capture all variants that are desired

matthiasgomolka commented 3 years ago

@tbecker2511 Yes, I think your last paragraph captures our reasoning quite well. We opted against it because it's hard to provide a function which is flexible enough but still covers our needs in terms of safety.

Maybe this will become a little easier with the next release of data.table: https://github.com/Rdatatable/data.table/issues/4247