sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.31k stars 304 forks source link

Explanation capabilities to understand the generative model obtained #330

Closed leferrad closed 2 years ago

leferrad commented 3 years ago

This is an amazing project and it could be useful not only for data augmentation purposes but also to understand the "physics" of the generative process behind the data observed. So I wonder if there is a feature on the roadmap regarding bringing some capabilities to diagnose the obtained model in order to understand / explain some properties behind the data, like conditional relationship between variables, low relevance of some variables in the generative process, issues on some distribution fitted for a specific variable (despite the evaluation metrics that make a general assessment), etc.

This question is intended to start a thread about the convenience of having these capabilities (and how they could be developed), so feel free to manage it as you prefer.

npatki commented 2 years ago

Hi @leferrad, thanks for asking this question. This is currently not on our roadmap.

Many of the properties you've described (conditional relationships, distribution issues, etc.) can be derived from comparing the real vs. synthetic data. The SDMetrics library is designed to do just that. You're welcome to go through this library and suggest new metrics that might be useful for this purpose.

Non-parametric models (such as GANs) won't be able to expose any easily understandable processes. Parametric models (such as the GaussianCopula) have a get_parameters method that you can use to see what the model learned (see API). This may not be that easy to parse but if you're interested, it would be great if you could file a new feature request for it.

npatki commented 2 years ago

Actually it seems like there is an existing feature request #129 related to this. I'll close this issue off in favor of it. Please feel free to continue the conversation there, or respond to this one if there's more to discuss (and I can re-open).

leferrad commented 2 years ago

Thanks for the response @npatki ! That library sounds very interesting, I'll be checking it! In the meantime it is ok to close this issue, thanks again.