sdv-dev / CTGAN

Conditional GAN for generating synthetic tabular data.
Other
1.27k stars 287 forks source link

Tracking and Saving TVAE Loss Values #307

Closed gjuresic closed 8 months ago

gjuresic commented 1 year ago

Environment details

Problem description

When fitting CTGAN, I can capture the loss values in the output variable. However, the same cannot be implemented when fitting the TVAE. How can I track and store the loss values to see if the model is able to learn the distribution of the data over the selected number of epochs?

npatki commented 1 year ago

Hi @gjuresic, currently it is not possible to track the loss values very easily although we have an outstanding feature request for it at #300. You can review the proposed functionality there and let us know if it will meet your needs.

In the meantime, there are still other options for seeing if the model learned distributions. One possible approach:

  1. Sample synthetic data from the fitted synthesizer
  2. Use the SDMetrics library to compare the real vs. synthetic data. If the synthesizer learned the distributions well, it should report high scores.
  3. You can also run a Quality Report and create visualizations to manually inspect the data.

Hopefully that helps! Let me know if you have any follow ups, but otherwise, I'd defer to issue #300 for the implementation that we want to add.

npatki commented 8 months ago

This feature has now been added. For API, see #300. Thanks.