yandex-research / tab-ddpm

[ICML 2023] The official implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models"
https://arxiv.org/abs/2209.15421
MIT License
397 stars 89 forks source link

Expected time to train to recreate results in Paper #7

Closed SvenGroen closed 1 year ago

SvenGroen commented 2 years ago

Hi, thanks for providing the code to your Paper. You wrote:

It takes approximately 7min to run the script above (NVIDIA GeForce RTX 2080 Ti).

to run your TabDDPM pipeline.

I was wondering if this was also the time you used for training in order to produce the results in your Paper (with an RTX 2080Ti)? If not, for how long did you approximately do the training and on what kind of GPU?

Cheers, Sven

rotot0 commented 2 years ago

Hi, it depends on the dataset. All experiments were conducted on an RTX 2080 Ti. The most time-consuming script is scripts/tune_ddpm.py, and it takes approximately 7–15 hours for 50 optuna trials. It also highly depends on the dataset and GPU load. I usually run experiments on 4 datasets at the same time per GPU.

Also, when the dataset has many samples, CatBoost evaluation during tuning becomes pretty expensive (since it is done on CPU).

Thanks for you question :)