Changing only missing values? and scoring?

jsyoon0823 / GAIN

Codebase for Generative Adversarial Imputation Networks (GAIN) - ICML 2018

365 stars 152 forks source link

Hi There,

First off, thank you for your amazing paper and software. I've been using it to learn and it's really cool.

I had a few questions:

Is there a way to have it only change the missing values and not the other parts? I am importing my pandas dataframe and filled my missing fields with 0 as well as I tried np.nan but had the same effect.
I assume I have to play with the models(g/d) to find the right settings but I'm not sure how I can get feedback on its progress. Is there a way to get a score for each iteration so I can understand where to make adjustments?

I'm new to gans and your tool so I apologize if these are basic questions.

Thanks again for the great tool and I'm really enjoying it.

Thank you for your interest in our paper.

I am not sure whether I understand your question correctly.
- In order to check which components are missing, you can use the mask vector (M).
- You can extract this as checking the nan component or 0 component (if you put 0 for missing values).
- After the training, the following part only replace the missing components to the imputed components: imputed_data = data_m norm_data_x + (1-data_m) imputed_data in line 167.
GAN training
- GAN training is not easy and there is no explicit score that we can track for understanding the training process.
- People usually see the G and D losses and check whether they are well-balanced because balancing G/D is the key for GAN training.
- In GAIN, you can check the following three losses: MSE_loss, G_loss, and D_loss (see line 153 - 157).
- MSE_loss represents whether the model can recover the observed components. G/D losses are the same roles in original GAN training.

Hopefully, these answers would resolve your questions.

jsyoon0823 / GAIN