sdv-dev / SDMetrics

Metrics to evaluate quality and efficacy of synthetic datasets.
https://docs.sdv.dev/sdmetrics
MIT License
201 stars 45 forks source link

Add a metric to evaluate anonymization #8

Open zuberek opened 5 years ago

zuberek commented 5 years ago

Description

Having an easy way of measuring the privacy of synthesized data would be very useful for users of the tool. It could be added on top of the existing evaluation metrics sdv-dev/SDV#52 .

An easy way to measure it would be to calculate average euclidean distance to the closest neighbour between real and synthetic data. It was used in TableGAN paper. However that would apply only to numerical data which in case of SDV sometimes is not enough.

It could also be implemnted in a way that the user can specify which fields he wants to use in the evaluation, for example some more sensitive fields should be taken into account while others can be ignored.

csala commented 4 years ago

Transferring this to SDMetrics

poornima-sivanand commented 2 years ago

This feature would be extremely useful. Can anyone please share if this has been added or point me to resources that show privacy assessment of data generated by CTGAN, DeepEcho and Copulas?