sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.21k stars 287 forks source link

TVAESynthesizer Model Details and Parameters #2094

Open pnimeesha opened 5 days ago

pnimeesha commented 5 days ago

Environment details

Problem description

  1. My first question is related to the paper here. In the section 4.5 (TVAE model), you mention that the model outputs a joint distribution of 2Nc + Nd variables. you also mention in the equation (attached below) about two variables αbar i,j and αhat i,j. Can you please explain these a bit and also about the combined distribution (last line in the equation)? tvae_model_equation

  2. I have observed that I cannot change the activation function in the TVAESynthesizer. Below is the snippet for the model params I could change (mentioned in sdv docs using synthesizer.get_parameters()).

    • Do you have any reasoning for not allowing the change in activation function and for using the ones mentioned in the paper?
    • l2scale-Regularization term default value is 1e-5. Can you please explain exactly the role of l2scale and how it effects the model?
    • I see that loss_factor for the reconstruction error has default value of 2. The total loss = reconstruction_loss + kl_loss. Does kl_loss also has any scaling factor and how would that effect the training and total loss?
    • The code line - synthesizer.get_loss_values() gives only the total loss, Is there a way I can track the reconstruction_loss and kl_loss separately?
    • Why is that the batch_size always should be a multiple of 10 and not a number like 512 or 256 (which are generally used for training process) ?

image

I hope my questions are clear. Thanks in advance!