Once #354 is complete we should migrate the code for the computation of property metrics to their respective classes. This issue is for the ColumnPairTrends property.
Expected behavior
Add a ColumnPairTrends class to the _properties module. The class should inherit from the BaseSingleTableProperty class.
Attributes
metrics: The metrics for this property are: [CorrelationSimilarity, ContingencySimilarity]
_details: A dataframe containing the following columns
'Table': Then table names
'Column 1': The names of the first column involved
'Column 2': The names of the second column involved
'Metric': The name of the metric
'Quality Score': The scores
'Real Correlation': The score for the real data's correlation for the columns
'Synthetic Correlation': The score for the synthetic data's correlation for the columns
Methods
get_score(real_data, synthetic_data, metadata, progress_bar) - Returns a float that is the average score of all the individual metric scores computed.
IMPORTANT: Keep in mind that correlation between column1, column2 is the same as
the correlation between column2, column1 -- so we do not need to compute twice
if there are n columns, total number of computations are n(n+1)/2
for column1, column2 in tqdm(all column pairs, disable=(not verbose)):
try:
if column1 == column2:
pair_score = 1 # correlation between a column and itself is 1
elif column1 or column 2 has < 2 unique values:
pair_score = NaN # must have 2 or more values for a correlation
elif both columns are continuous:
pair_score = CorrelationSimilarity
elif both columns are discrete:
pair_score = ContingencySimilarity
elif one is discrete and one is continuous:
pair_score = ContingencySimilarity (discrete data)
else:
continue # don't compute correlation for PII or id
except Exception as e:
pair_score = NaN
Warning("Unable to compute Column Pair trends for <column1> and <column2." +
"Encountered Error: type(e).__name__, e"
try:
overall_property_score = average(all pair scores)
except:
overall_property_score = NaN
get_visualization(table_name) - Returns a plotly.graph_objects._figure.Figure object for the specified table. Use code similar to what's in get_column_pairs_plot.
Problem Description
Once #354 is complete we should migrate the code for the computation of property metrics to their respective classes. This issue is for the
ColumnPairTrends
property.Expected behavior
Add a
ColumnPairTrends
class to the_properties
module. The class should inherit from theBaseSingleTableProperty
class.Attributes
metrics
: The metrics for this property are: [CorrelationSimilarity, ContingencySimilarity]_details
: A dataframe containing the following columnsMethods
get_score(real_data, synthetic_data, metadata, progress_bar)
- Returns a float that is the average score of all the individual metric scores computed.IMPORTANT: Keep in mind that correlation between column1, column2 is the same as
the correlation between column2, column1 -- so we do not need to compute twice
if there are n columns, total number of computations are n(n+1)/2
for column1, column2 in tqdm(all column pairs, disable=(not verbose)): try: if column1 == column2: pair_score = 1 # correlation between a column and itself is 1
try: overall_property_score = average(all pair scores) except: overall_property_score = NaN
get_visualization(table_name)
- Returns aplotly.graph_objects._figure.Figure
object for the specified table. Use code similar to what's inget_column_pairs_plot
.Additional context