sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
213 stars 45 forks source link

Evaluation metrics for synthetic generated PII informations. #335

Open yash-rathore opened 1 year ago

yash-rathore commented 1 year ago

Problem Description

What are the different metrics I can use to check quality of PII information produced? report.get_diagnostics() checks the coverage and range of numerical/categorical data. But is there a sole metric I can use to check like duplicacy/quality of PII generated ?

npatki commented 1 year ago

Thanks for filing this issue @yash-rathore. This requires some more thought. We can keep it open to communicate updates and have discussions.

At a high level, it would be interesting to identify the useful properties of PII columns.