hakyim / TO-DELETE-PrediXcan

Code for the in-dev PrediXcan Project
MIT License
28 stars 82 forks source link

Breakdown of PrediXcan Framework #37

Closed eallen4040 closed 3 years ago

eallen4040 commented 3 years ago

Hello!

I've been looking over PrediXcan to understand its general functions. However, I have a few clarity questions in reference to the Framework image that is presented in the corresponding paper, "A gene-based association method for mapping traits using reference transcriptome data."

  1. Under Genetic Variation, there are columns and rows that represent rsids and ids for each individual. What do 0, 1, and 2 specifically represent in this scenario? Also, why do the rsids appear to alternate between 1 and 2 instead of continuing the count?

  2. Under Observed Transcriptome, there are columns and rows that represent genes and ids for each individual. What do the numbers in the table represent? Any significance between small/big numbers?

If these questions aren't ideal to be presented here, can you direct me to someone that can further explain?

PrediXcanFramework.pdf

Heroico commented 3 years ago

Hi there!

This question is better suited for the PrediXcan/MetaXcan google group here

Re 1.: 0, 1, 2 are the count of "effect" alleles. i.e. the alternate to the reference allele in a variant, for a given infidivual at a variant location. Since you have two chromosomes, you can have 0 (no alternate allele in any chromosome), 1 (one alternate in either) or 2 (one alternate in each chromosome).

Re 2.: the value represents the "expression" of a gene for a certain individual. Depending on sequencing technology and processing, this might have been "normalized" by any number of methods so that it effectively means "how much more gene products we get than the mean"