omaralvarez / gentab

Tabular Synthetic Data Augmentation Library
GNU General Public License v3.0
4 stars 0 forks source link

Description of tuners #1

Open glocamhe opened 2 days ago

glocamhe commented 2 days ago

Hi, first of all great job! I was just wondering what is the role of the tuners. As far as I know, generators are used to create the new dataset with synthetic data, so what is the point of the tuners?

Thanks.

omaralvarez commented 2 days ago

Hi! Thank you for your interest. You are right on the generator front, they indeed create the new data. The thing with the generators is that they have quite a lot of hyperparameters (somewhere from 5-20+ depending on the model) which affect the generated data's performance when using the data to train downstream models.

The idea with the tuning module is obtaining the best set of complementary synthetic data for a certain task, e.g. for a certain generation model (ForestDiffusion) obtain the best synthetic dataset that maximizes classification performance for a certain classification model (XGBoost).

I hope that this clears out any doubts. 👍