jsyoon0823 / GAIN

Codebase for Generative Adversarial Imputation Networks (GAIN) - ICML 2018
365 stars 152 forks source link

Changing only missing values? and scoring? #21

Closed gautambak closed 4 years ago

gautambak commented 4 years ago

Hi There,

First off, thank you for your amazing paper and software. I've been using it to learn and it's really cool.

I had a few questions:

  1. Is there a way to have it only change the missing values and not the other parts? I am importing my pandas dataframe and filled my missing fields with 0 as well as I tried np.nan but had the same effect.
  2. I assume I have to play with the models(g/d) to find the right settings but I'm not sure how I can get feedback on its progress. Is there a way to get a score for each iteration so I can understand where to make adjustments?

I'm new to gans and your tool so I apologize if these are basic questions.

Thanks again for the great tool and I'm really enjoying it.

jsyoon0823 commented 4 years ago

Thank you for your interest in our paper.

  1. I am not sure whether I understand your question correctly.

    • In order to check which components are missing, you can use the mask vector (M).
    • You can extract this as checking the nan component or 0 component (if you put 0 for missing values).
    • After the training, the following part only replace the missing components to the imputed components: imputed_data = data_m norm_data_x + (1-data_m) imputed_data in line 167.
  2. GAN training

    • GAN training is not easy and there is no explicit score that we can track for understanding the training process.
    • People usually see the G and D losses and check whether they are well-balanced because balancing G/D is the key for GAN training.
    • In GAIN, you can check the following three losses: MSE_loss, G_loss, and D_loss (see line 153 - 157).
    • MSE_loss represents whether the model can recover the observed components. G/D losses are the same roles in original GAN training.

Hopefully, these answers would resolve your questions.