Closed ts-kim closed 5 months ago
Please see the Section 4:
We split all data sets into 80% train and 20% test sets to avoid any data leakage.
We used 80% of samples from each dataset for training, and the rest 20% was used to do the evaluations.
Did you consistently use 1000 samples across all experiments?
We sampled the same amount of samples as in the test sets.
Thank you for your kind response.
I am writing to seek further clarification related to a matter previously discussed in the GitHub issue "https://github.com/kathrinse/be_great/issues/46" regarding your manuscript.
In your manuscript, Table 6 is described as providing "A run time comparison of all generative models of our study. Selected models were trained/fine-tuned for 100 epochs and 1000 samples were generated." However, I seek clarification regarding the number of samples used for the model presented in Table 1.
In Section C, Reproducibility details, it is noted that "The GReaT baseline is fine-tuned for 110, 310, 400, 255, 150, 85, epochs for California Housing, Adult Income, Travel, Home Equity Line of Credit (HELOC), Sick (Dua & Graff, 2017), and Diabetes data sets, respectively." Given the difference in the number of epochs, which suggests different experimental conditions from those described in Table 6, I am prompted to inquire about the number of samples generated for classification and regression performances.
Did you consistently use 1000 samples across all experiments?
Thank you for your clarification.