Closed sdimi closed 4 years ago
I'm also getting better results with baselines (missforest and sklearn's MICE) than GAIN. I'm using the default configuration for baselines and using the code/parameters provided for GAIN in this implementation on both Letter and Spam datasets.
For paper writing, we explicitly divide the data into train/test and train all the models (including GAIN, MICE, and MissForest) on the train data only. Then, we use the trained data to test on the testing data. However, in this repository, I think that people usually do imputation before dividing the data for further model developing. Therefore, I do not divide the train/test in this repository.
Hi,
By reading the paper I think that the baselines (like MICE, missforest etc.) are calculated only on the test-set. On the other hand, GAIN learns a model from the bigger training set and then predicts on the test set.
What are your thoughts on that subtle difference?