Closed raprasad closed 4 years ago
Proposed schema from PSI tool
For each variable, I propose we have attributes in metadata for: name (for summary statistics and for variable types) median mean mode max min invalid (values) valid (values) stdev unique (values) Herfindahl index (concentration) frequency mode frequency midpoint frequency fewest mid-point number of characters interval (discrete or continuous) numchar (numeric or character) binary
This is based on PSI data release.
e.g. can start with: https://pypi.org/project/tworavens-preprocess/