sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.36k stars 312 forks source link

Is loss_factor not used? How to compute reconstruction loss? #2196

Closed turian closed 1 month ago

turian commented 2 months ago

I am trying to compute reconstruction loss on held-out data, to do model selection.

However, I can't find anywhere in this repo or CTGan repo how to do this. In fact, I can't even find instances of loss_factor even being used.

I searched: https://github.com/search?q=repo%3Asdv-dev%2FCTGAN%20loss_factor&type=code https://github.com/search?q=repo%3Asdv-dev%2FSDV%20loss_factor&type=code

in the two repos and can't find it.

How do I compute reconstruction loss from a trained TVAE?

Related: #2166

srinify commented 2 months ago

Hi there @turian 👋

Loss Factor

The loss_factor parameter value is in fact used during model fitting: https://github.com/sdv-dev/CTGAN/blob/main/ctgan/synthesizers/tvae.py#L189

Reconstruction Loss

You can extract the loss values by calling the loss_values attribute, which returns a pandas DataFrame object with the loss values at the batch & epoch level.

SDV vs CTGAN

In general, I'd recommend interacting with our models using the classes inside SDV - CTGANSynthesizer and TVAESynthesizer. These use the ctgan library under the hood but offer more affordances, a better user experience, and as a team we're more focused on SDV.

For example, we have functions to return the DataFrame of loss values and another function to generate a helpful plot of loss against the epochs.

srinify commented 2 months ago

Hi @turian just following up :)

srinify commented 1 month ago

Hi @turian I haven't heard from you in over 2 weeks so I'll be closing this issue out. Feel free to open a new issue with new questions or blockers!