yanglab-emory / SR-TWAS

5 stars 0 forks source link

SR-TWAS

SR-TWAS stands for Stacked Regression-Transcriptome-Wide Association Study, which is developed using Python and BASH scripts for creating improved Genetically Regulated gene eXpression (GReX) prediction models by using an Ensemble Machine Learning technique of Stacked Regression to form optimal linear combinations of previously trained gene expression imputation models. SR-TWAS allows users to leverage multiple reference panels of the same tissue type in order to improve GReX prediction accuracy and TWAS power with increased effective training sample sizes.

DOI


SR-TWAS: Leveraging Multiple Reference Panels to Improve Transcriptome-Wide Association Study Power by Ensemble Machine Learning. Randy L. Parrish, Aron S. Buchman, Shinya Tasaki, Yanling Wang, Denis Avey, Jishu Xu, Philip L. De Jager, David A. Bennett, Michael P. Epstein, Jingjing Yang. Nature Communications 15, 6646 (2024). doi: https://doi.org/10.1038/s41467-024-50983-w



Software Setup

1. Install BGZIP, TABIX, Python 3.6, and the following Python libraries

2. Make files executable

Input Files

Example input files provided under ./ExampleData/ are generated artificially. All input files are Tab Delimited Text Files.

Validation Data

1. Genotype File

VCF (Variant Call Format)
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample...
1 100 rs1 C T . PASS . GT:DS 0/0:0.01 ...
Dosage file
CHROM POS ID REF ALT Sample1 Sample...
1 100 rs** C T 0.01 ...

2. SampleID File

3. Gene Annotation/Gene Expression File

CHROM GeneStart GeneEnd TargetID GeneName Sample1 Sample...
1 100 200 ENSG0000 X 0.2 ...

Weight (eQTL effect size) Files

CHROM POS REF ALT TargetID ES MAF
1 100 C T ENSG0000 0.2 0.02

SR-TWAS: Example Usage

Arguments

Example Command

gene_exp="${SR_TWAS_dir}/ExampleData/gene_expression.txt"
train_sampleID="${SR_TWAS_dir}/ExampleData/train_sampleID.txt"
genofile="${SR_TWAS_dir}/ExampleData/genotype.vcf.gz"
out_dir="${SR_TWAS_dir}/ExampleData/output"

weight0="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort0_eQTLweights.txt.gz"
weight_name0=cohort0

weight1="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort1_eQTLweights.txt.gz"
weight_name1=cohort1

weight2="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort2_eQTLweights.txt.gz"
weight_name2=cohort2

${SR_TWAS_dir}/SR-TWAS.sh \
--chr 1 \
--cvR2 1 \
--format GT \
--gene_exp ${gene_exp} \
--genofile ${genofile} \
--genofile_type vcf \
--hwe 0.0001 \
--maf 0.01 \
--out_dir ${out_dir} \
--parallel 2 \
--SR_TWAS_dir ${SR_TWAS_dir} \
--train_sampleID ${train_sampleID} \
--weights ${weight0} ${weight1} ${weight2} \
--weights_names ${weight_name0} ${weight_name1} ${weight_name2}

Output

Naive method: Example Usage

Arguments

Arguments are the same as for SR-TWAS.

Example Command

gene_exp="${SR_TWAS_dir}/ExampleData/gene_expression.txt"
train_sampleID="${SR_TWAS_dir}/ExampleData/train_sampleID.txt"
genofile="${SR_TWAS_dir}/ExampleData/genotype.vcf.gz"
out_dir="${SR_TWAS_dir}/ExampleData/output"

weight0="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort0_eQTLweights.txt.gz"
weight_name0=cohort0

weight1="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort1_eQTLweights.txt.gz"
weight_name1=cohort1

weight2="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort2_eQTLweights.txt.gz"
weight_name2=cohort2

${SR_TWAS_dir}/Naive.sh \
--chr 1 \
--cvR2 1 \
--format GT \
--gene_exp ${gene_exp} \
--genofile ${genofile} \
--genofile_type vcf \
--hwe 0.0001 \
--maf 0.01 \
--out_dir ${out_dir} \
--parallel 2 \
--SR_TWAS_dir ${SR_TWAS_dir} \
--train_sampleID ${train_sampleID} \
--weights ${weight0} ${weight1} ${weight2} \
--weights_names ${weight_name0} ${weight_name1} ${weight_name2}

Output

Avg-valid+SR: Example Usage

Arguments

Example Command


gene_anno="${SR_TWAS_dir}/ExampleData/gene_expression.txt"
out_dir="${SR_TWAS_dir}/ExampleData/output"

valid_weight0="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort0_eQTLweights.txt.gz"
valid_weight_name0=valid_model0

valid_weight1="${SR_TWAS_dir}/ExampleData/CHR1_DPR_cohort1_eQTLweights.txt.gz"
valid_weight_name1=valid_model1

## run example SR command to get output weight file
SR_weight="${SR_TWAS_dir}/ExampleData/output/SR_CHR1/CHR1_SR_train_eQTLweights.txt.gz"
SR_weight_name=SR

${SR_TWAS_dir}/Avg-valid_SR.sh \
--gene_anno ${gene_exp} \
--chr 1 \
--parallel 2 \
--out_dir ${out_dir} \
--SR_TWAS_dir ${SR_TWAS_dir} \
--weights ${valid_weight0} ${valid_weight1} ${SR_weight} \
--weights_names ${valid_weight_name0} ${valid_weight_name1} ${SR_weight_name}

Output