kathrinse / be_great

A novel approach for synthesizing tabular data using pretrained large language models
MIT License
254 stars 43 forks source link

How many samples are used for the classification and regression? #46

Closed ts-kim closed 3 months ago

ts-kim commented 6 months ago

Hi, thanks for the great work. I'm interested in re-implementing the experiments from your paper. Could you please provide information on the number of samples used for both regression and classification? I've looked through the paper but couldn't find this detail.

unnir commented 3 months ago

Hi,

Please look at the table 6 in our paper: https://arxiv.org/pdf/2210.06280.pdf

ts-kim commented 3 months ago

In your manuscript, Table 6 is described as providing "A run time comparison of all generative models of our study. Selected models were trained/fine-tuned for 100 epochs and 1000 samples were generated." However, I seek clarification regarding the number of samples used for the model presented in Table 1.

In Section C, Reproducibility details, it is noted that "The GReaT baseline is fine-tuned for 110, 310, 400, 255, 150, 85, epochs for California Housing, Adult Income, Travel, Home Equity Line of Credit (HELOC), Sick (Dua & Graff, 2017), and Diabetes data sets, respectively." Given the difference in the number of epochs, which suggests different experimental conditions from those described in Table 6, I am prompted to inquire about the number of samples generated for classification and regression performances.

Did you consistently use 1000 samples across all experiments?

Thank you for your clarification.