hallnath1 / CASTLEGUARD

CASTLEGUARD: Continuously Anonymizing STreaming data via adaptive cLustEring with GUARanteed Differential privacy
Apache License 2.0
7 stars 0 forks source link

RFC: Pre-Processing of Categorical Data #52

Open FreddieBrown opened 4 years ago

FreddieBrown commented 4 years ago

Summary

This lays out how to process categorical data for CASTLE

Motivation

Come up with easy way to turn categorical data into data which can be used for CASTLE so it can be used for Machine Learning afterwards.

Design

process returns a Pandas DataFrame with 2 arguments and is included in the ml_utilities module. The arguments are:

Output Format

The return value is a mutated version of data which contains the same column headers, but categorical values converted to numbers based on their position in the hierarchy.

Drawbacks

None