sdv-dev / SDV

Synthetic data generation for tabular data
https://docs.sdv.dev/sdv
Other
2.3k stars 303 forks source link

CTGAN function difference in sdv version 0.14 and 1.0.0 #1458

Closed 2jjsjjs closed 1 year ago

2jjsjjs commented 1 year ago

Environment details

If you are already running SDV, please indicate the following details about the environment in which you are running it:

Problem description

i'm using CTGAN function to generate synthetic data in sdv version in 0.14 and 1.0.0 but There's a significant speed difference between three and six times version 0.14 is more faster than version 1.0.0 what is the difference in CTGAN function in sdv 0.14 version and 1.0.0

i'm using single table metadata

What I already tried

<Replace with a description of what you already tried and what is the behavior that you observe. If possible, also add below the exact code that you are running.>

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.
npatki commented 1 year ago

Hi @2jjsjjs nice to meet you.

Unfortunately, I'm not able to replicate these findings. On my machine, the differences in running SDV 0.14.0 and SDV 1.0+ are marginal (only a few seconds at most).

We have not changed the nature of the underlying CTGAN algorithm, but the 1.0 release does have significant changes when it comes to the internal workflow and abstractions. Two things that may affect performance: (a) In 1.0, we verify that your metadata is correct and (b) In 1.0, we verify that your data matches the metadata

But this verification only take a few seconds and should not lead to any significant increases in performance.

Next Steps

To replicate what you're seeing, it would be helpful if you could provide the following info:

  1. The metadata that corresponds to the dataset you are using
  2. The size of the dataset you are using (# of rows is sufficient)
  3. The performance times that you are observing for old vs. new version of SDV

P.S. To get the most out of SDV, I would definitely recommend using the latest version of Python available. Older versions of SDV (such as 0.14.0) can use up to Python 3.8. Newer versions of SDV (such as 1.1.0) can use Python 3.10.

npatki commented 1 year ago

Hi @2jjsjjs are you still encountering this problem? This issue has been inactive for a few weeks so I'm closing it off.

If you'd like to continue the discussion, please feel free to respond and I can always reopen the issue.