I'm filing this issue on behalf of a user request on our Slack.
Problem Description
Currently, users are able to plot the data in statistical columns such as numerical, categorical, etc. (utils.get_relationship_plot) only supports columns that are numerical, categorical, boolean or datetime.
It would be nice to support a visualization for the foreign key/primary key relationship -- when it comes to the cardinality.
Expected behavior
Create a new visualization utils.get_cardinality_plot. This should plot the cardinality (# of children) that each parent row has, colored by real vs. synthetic data.
Parameters:
(required) real_data: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to real data.
(required) synthetic_data: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to synthetic data.
(required) child_table_name: The string name of the child table
(required) parent_table_name: The string name of the parent table
(required) child_foreign_key: The string name of the child's foreign key column (that links to the parent's primary key)
(required) metadata: A dictionary of Multi Table Metadata
Output: A plotly.Figure object with a bar graph. The graph shows the # of children that each parent row has. The color represents real vs. synthetic data.
I'm filing this issue on behalf of a user request on our Slack.
Problem Description
Currently, users are able to plot the data in statistical columns such as numerical, categorical, etc. (
utils.get_relationship_plot
) only supports columns that are numerical, categorical, boolean or datetime.It would be nice to support a visualization for the foreign key/primary key relationship -- when it comes to the cardinality.
Expected behavior
Create a new visualization
utils.get_cardinality_plot
. This should plot the cardinality (# of children) that each parent row has, colored by real vs. synthetic data.Parameters:
real_data
: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to real data.synthetic_data
: A dictionary mapping each table name to a pandas.DataFrame containing the data. This dictionary corresponds to synthetic data.child_table_name
: The string name of the child tableparent_table_name
: The string name of the parent tablechild_foreign_key
: The string name of the child's foreign key column (that links to the parent's primary key)metadata
: A dictionary of Multi Table MetadataOutput: A plotly.Figure object with a bar graph. The graph shows the # of children that each parent row has. The color represents real vs. synthetic data.