Closed koseoyoung closed 8 months ago
Hi @koseoyoung, nice to meet you.
As long as your code isn't crashing, it should still be running as intended. Even though you have only specified 1 epoch, TVAE still uses the batch_size
parameter to iterate through different portions of your data. This may be taking some time.
I understand the frustration of not having a verbose option, so I've added #300 as a proposed feature request.
I'm wondering if there is any max length of the training dataset for TVAE (for dataset fitting).
While there is no theoretical max length, you may find certain dataset sizes infeasible for the computational power that you have. For GAN-based synthesizers, many users report needing a few hours.
The dataset size was around 80 MB, and I was running the code with CPU.
If possible, running on a GPU might be a good option. Alternatively, you can subsample your data for training purposes. The important thing is to make sure your subsample contains the patterns you are trying to learn. For example, all the possible categories, a large range of numerical values, etc.
Marking this issue as resolved since it has been inactive for some time. The good news is that the feature in #300 has been added, so you can now view the progress bar to track estimated time.
If you have additional questions, please feel free to file a new issue. Thanks.
Environment details
If you are already running CTGAN, please indicate the following details about the environment in which you are running it:
Problem description
I'm wondering if there is any max length of the training dataset for TVAE (for dataset fitting). I've tried a large dataset, but it seems like it takes too long, although the epoch is specified as 1. The dataset size was around 80 MB, and I was running the code with CPU. (keep running more than 1 hr -- and not able to see any logs) Is it expected behavior? Since there is no verbose option, debugging whether it's working on training or having some error is hard.
Thank you! : )
What I already tried