rvinas / GTEx-imputation

Gene Expression Imputation with Generative Adversarial Imputation Nets
MIT License
11 stars 3 forks source link

Use genotype vector of each individual correct? #1

Closed yezhengli-Mr9 closed 4 years ago

yezhengli-Mr9 commented 4 years ago

Hi Ramon Vinas, You should have used genotype vector of each individual (just as [1]) correct? In "problem formulation" of [2], it seems "gene expression values with missing components" is the major input. [1] Wang, Jiebiao, et al. "Imputing gene expression in uncollected tissues within and beyond GTEx." The American Journal of Human Genetics 98.4 (2016): 697-708. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4833292/pdf/main.pdf [2] Torne, Ramon Vinas, et al. "Gene Expression Imputation with Generative Adversarial Imputation Nets." BioRxiv (2020). https://www.biorxiv.org/content/10.1101/2020.06.09.141689v1.abstract

rvinas commented 4 years ago

Hi Yezheng Li, the problems addressed in [1] and [2] are different - we impute missing or unreliable components within a single gene expression sample (coming exclusively from one tissue), e.g. we do not leverage information from other tissues of the same patient (however, this would be a really interesting extension of our work).

yezhengli-Mr9 commented 4 years ago

Thanks for your fast response. On my side, I just try to (1) run through your code (currently no data there anyway but I try to fit my data) and (2) provides detailed explanation (for example, explaining no "genotype vector of each individual") to my PhD adviser.

Hi Yezheng Li, the problems addressed in [1] and [2] are different - we impute missing or unreliable components within a single gene expression sample (coming exclusively from one tissue), e.g. we do not leverage information from other tissues of the same patient (however, this would be a really interesting extension of our work).

OK, the problem "address" is definitely be exactly the same (but at least both involves "imputation" of missing data; sometimes distinguishing major differences between works [1,2], etc. are time consuming -- sorry about that).

My PhD adviser thinks "genotype vector of each individual" is important for such imputation and I personally think this is "of each individual" not "leverage information from other tissues" (if this the major difference between [1,2]). Let me double check if should have a good reason not using such feature. Let me check it out.