jsyoon0823 / GAIN

Codebase for Generative Adversarial Imputation Networks (GAIN) - ICML 2018
365 stars 152 forks source link

Using GAIN to imputing the missing cytokines data #13

Closed biofuture closed 4 years ago

biofuture commented 4 years ago

Dear Yoon

I found this newly developed tool to imputing the missing data using GAN. It seems that your tool is significantly better than the traditional method. We have some cytokines data for cancer patients, however, about 20% of the data is missing due to below detection.

Hence, I would like to query you whether it is possible to use your tool to do the imputation for the missing data with this type of feature.

The missing data is the value below the smallest value in the vector of one cytokine.

Thank you very much.

biofuture commented 4 years ago

For many of the currently generated biological data, they are all with this feature, such as metabolomics, cytokines data. The machine has a detection limit for the concentration if your tool can support this type of imputation, I think many people would like to use your tool in their study.

Best wishes!

jsyoon0823 commented 4 years ago

Unfortunately, I do not think GAIN (and other data-driven methods) can be used for this setting. It is very hard for data-driven methods to estimate something out of the distribution from the observed data. I suggest that instead of using the imputation, it may be good to use the missingness as the additional feature (binary feature) and put the missing parts as 0.