Once #354 is complete we should migrate the code for the computation of property metrics to their respective classes. This issue is for the ColumnShapes property.
Expected behavior
Add a ColumnShapes class to the _properties module. The class should inherit from the BaseSingleTableProperty class.
Attributes
metrics: The metrics for this property are: [KSComplement, TVComplement]
_details: A dataframe containing the following columns
'Column': The column names
'Metric': The name of the metric used on that column
'Score': The score computed for that column
Abstract methods
get_score(real_data, synthetic_data, metadata, progress_bar) - Returns a float that is the average score of all the individual metric scores computed.
Pseudo code
for column in tqdm(columns, disable=(not verbose)):
try:
if sdtype is 'numerical' or 'datetime':
column_score = KSComplement.compute(real_column, synthetic_column)
elif sdtype is 'categorical' or 'boolean':
column_score = TVComplement.compute(real_column, synthetic_column)
else:
# it is PII so this doesn't apply
continue
except Exception as e:
column_score = NaN
Warning("Unable to compute Column Shape for column <name>. " +
"Encountered Error: type(e).__name__ e")
try:
overall_score = average(all column scores)
except:
overall_score = NaN
get_visualization() - Returns a plotly.graph_objects._figure.Figure object for the specified table. Use code similar to what's in get_column_shapes_plot.
Problem Description
Once #354 is complete we should migrate the code for the computation of property metrics to their respective classes. This issue is for the
ColumnShapes
property.Expected behavior
Add a
ColumnShapes
class to the_properties
module. The class should inherit from theBaseSingleTableProperty
class.Attributes
metrics
: The metrics for this property are: [KSComplement, TVComplement]_details
: A dataframe containing the following columnsAbstract methods
get_score(real_data, synthetic_data, metadata, progress_bar)
- Returns a float that is the average score of all the individual metric scores computed.try: overall_score = average(all column scores) except: overall_score = NaN
get_visualization()
- Returns aplotly.graph_objects._figure.Figure
object for the specified table. Use code similar to what's inget_column_shapes_plot
.Additional context