nhsx / SynthVAE

Synthetic data generation by a Variational AutoEncoder with Differential Privacy assessed using Synthetic Data Vault metrics
MIT License
44 stars 10 forks source link

Poor correlations when using `GMM` preprocessing #15

Open danjscho opened 2 years ago

danjscho commented 2 years ago

Describe the bug When running the GMM preprocessing, we are seeing poor correlations coming out of the trained VAE. Investigate this further.

To Reproduce Steps to reproduce the behavior:

  1. Run any training setup involving GMM preprocessing
  2. Compare correlations from original and generated data via something like .corr()

Expected behavior Some closer matching of correlations of between generated data and original data when using GMM preprocessing

Additional context Comparison to TVAE in CTGAN from SDV is worth utilising (https://github.com/sdv-dev/CTGAN)