Closed JanJacekJaniszewski closed 2 months ago
Hi @JanJacekJaniszewski, thanks for filing this issue with the detailed code an stack trace.
The 'gaussian_kde'
distribution is not actually supported by the HMASynthesizer due to an algorithmic incompatibility. (The HMA algorithm is designed to work only with parametric distributions that have a pre-determined and set # of parameters.)
We've just updated the HMA docs with this clarification.
I would suggest using any of the other distributions such as 'beta'
, 'norm'
, etc.
HMASynthesizer.set_table_parameters
with a Gaussian KDE.Update: I am re-purposing this issue as a feature request.
Linked, you can see that the other related issue has now been closed. In the upcoming SDV release, we will provide a better error message when using 'gaussian_kde'
for the HMA.
An update on this issue -- due to the nature of the HMA algorithm, we will be unable to accommodate using the gaussian_kde
with certain columns of the HMASynthesizer. Note that doing so would also increase the compute time, which can be high for certain schemas in HMA.
Instead, we recommend using the HSASynthesizer instead. The HSA algorithm can handle complex schemas as well as non-parametric kde
distributions for individual columns.
Do note that this synthesizer is only available for paid SDV plans. To learn more, you can visit our support page. Thanks.
Environment Details
Please indicate the following details about the environment in which you found the bug:
Error Description
Setting table parameters in HMASynthesizer causes a num_rows error when fitting the model.
Steps to reproduce
Input
Output