sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
210 stars 45 forks source link

Multi table quality report should handle multi-foreign keys (to same parent) #406

Closed npatki closed 1 year ago

npatki commented 1 year ago

Problem Description

Currently the Cardinality property in the multi-table quality report assumes that there is only 1 connection between every parent and child table. This is not always true.

It's possible that a child table has multiple foreign keys that point to the same primary key column in the parent. For example: I can have a parent table banks and a child table transactions. Then for bank-to-bank transactions, there should be 2 foreign keys in transactions that point point to banks (they represent the payor and payee).

Expected behavior

The Quality Report should be updated to account for this case.

In get_details, we expect to show a DataFrame for each breakdown. This table should include a Foreign Key column to distinguish relationships that have the same parent and child tables. (Note that we can still use table_name to select the portions of the dataframe that match either the parent or child table.)

image

In get_visualization, each bar is currently labeled with child and parent. We should also update it with the name of the foreign key. Eg. transactions (payor) -> banks

image