opendp / smartnoise-core

Differential privacy validator and runtime
MIT License
290 stars 33 forks source link

Define Metadata needed for OpenDP #277

Closed raprasad closed 4 years ago

raprasad commented 4 years ago

e.g. can start with: https://pypi.org/project/tworavens-preprocess/

mikephelan commented 4 years ago

Proposed schema from PSI tool

mikephelan commented 4 years ago

For each variable, I propose we have attributes in metadata for: name (for summary statistics and for variable types) median mean mode max min invalid (values) valid (values) stdev unique (values) Herfindahl index (concentration) frequency mode frequency midpoint frequency fewest mid-point number of characters interval (discrete or continuous) numchar (numeric or character) binary

This is based on PSI data release.