Expression Metrics and Concordance with Datasets

jvivian / gene-outlier-detection

A Bayesian model for identifying gene expression outliers for individual single samples (N-of-1) when compared to a cohort of background datasets.

9 stars 3 forks source link

Sample	Expression
GTEX-1117F-0226-SM-5GZZ7	8.764
GTEX-1117F-0426-SM-5EGHI	3.861
GTEX-1117F-0526-SM-5EGHJ	7.349

Sample

Expression

GTEX-1117F-0226-SM-5GZZ7

8.764

GTEX-1117F-0426-SM-5EGHI

3.861

GTEX-1117F-0526-SM-5EGHJ

7.349

Sample	Expression
GTEX-1117F-0226-SM-5GZZ7	5.907
GTEX-1117F-0426-SM-5EGHI	5.140
GTEX-1117F-0526-SM-5EGHJ	5.946

Sample

Expression

GTEX-1117F-0226-SM-5GZZ7

5.907

GTEX-1117F-0426-SM-5EGHI

5.140

GTEX-1117F-0526-SM-5EGHJ

5.946

Hi @eyzhao ,

Please accept my apologies, I never received a notification that this issue was opened and I'm not sure why.

Can you link exactly which GTEx file you downloaded? My starting dataframe for expression didn't come directly from GTEx, but from the UCSC Toil recompute, which likely involved different preprocessing and alignment steps than what GTEx used (at least I didn't see their process on that page).

In the data folder, the GTEx and TCGA data should have values that correspond to: np.log2(TPM + 1). I started with the data frames available on Xena, which use log2(TPM + 0.001). I transformed those values back to TPM, confirmed they summed to ~1 million, then applied the np.log2(TPM + 1) transformation.

Please let me know if that does not answer your questions and in the future feel free to email me directly if you do not receive a sufficiently prompt response.

jvivian / gene-outlier-detection

Expression Metrics and Concordance with Datasets #68