Closed jianvhuang closed 1 year ago
Hello Jian Huang,
Thank you for your interest in the work.
The --pigz parameter should be the complete path to the pigz binary file (i.e. including the pigz binary not just the directory it is in). Assuming the binary file is named 'pigz', your --pigz parameter should be '/fs1/storedir/huangjian//analysis/AAA-script/pigz-2.7/pigz'
This depends on how you store the FUSION SNP-weights and the .pos file you are using. I have data for each PANEL in a separate folder, and when I specify this structure when I create the file 'All_tissues.pos' by adding an extra '$PANEL/' to the WGT column. For example a weight file is here 'SNP-weights/Whole_Blood/Whole_Blood/Whole_Blood.ENSG00000273355.1.wgt.RDat', and the .pos file WGT column would show 'Whole_Blood/Whole_Blood/Whole_Blood.ENSG00000273355.1.wgt.RDat'. So when I run FUSION_score_file_creator.R, I specify:
--weights All_tissues.pos \ --weights_dir SNP-weights \
Sorry if this is a little clunky. If editing the script makes it work for your set up, that is a fine solution.
plink1.9 \
--bfile /fs1/storedir/huangjian//analysis/AAA-data/1000G/LDREF/EAS/1000G.EAS.hm3.1 \
--extract /fs1/storedir/huangjian//analysis/eQTL/GTEx8/precomputed_eQTL_scores/SCORE_FILES_Whole_Blood/ENSG00000215914.4.snplist \
--allow-no-sex \
--score /fs1/storedir/huangjian//analysis/eQTL/GTEx8/precomputed_eQTL_scores/SCORE_FILES_Whole_Blood/ENSG00000215914.4.SCORE 1 2 4 \
--out /fs1/storedir/huangjian//analysis/eQTL/GTEx8/1000G_ref_scores/ENSG00000215914.4
Let me know how you get on.
Ollie
I can run the Rscript ./FUSION_ref_scorer.R
after changing the--plink plink1.9
to --plink plink2
. It should be a problem with the plink version in my system.
Running the below script directly also gives me the correct output (a log
file and a .sscore
file).
plink2 \
--bfile /fs1/storedir/huangjian//analysis/AAA-data/1000G/LDREF/EAS/1000G.EAS.hm3.1 \
--extract /fs1/storedir/huangjian//analysis/eQTL/GTEx8/precomputed_eQTL_scores/SCORE_FILES_Whole_Blood/ENSG00000215914.4.snplist \
--allow-no-sex \
--score /fs1/storedir/huangjian//analysis/eQTL/GTEx8/precomputed_eQTL_scores/SCORE_FILES_Whole_Blood/ENSG00000215914.4.SCORE 1 2 4 \
--out /fs1/storedir/huangjian//analysis/eQTL/GTEx8/1000G_ref_scores/ENSG00000215914.4
The output for each gene includes a log
file, a .profile_mini
file, and a .sscore
file.
ENSGxxxxxx.SCORE
file? reference score for these genes cannot be calculated because there are no SNPs in the input .SCORE
file.FUSION_ref_scorer.R
still generate valid output. And the input .SCORE
file for these genes are also valid (with four columns: SNP, A1, A2, SCORE) Thank you.
Thank you for the advice.
The main reason that I had .SCORE
files with all NA rows and empty .SCORE
files from FUSION_score_file_creator.R
is that I used the nofilter.pos
from GTEx8 SNP-weights. For these genes, the wgt.matrix
of the best methods, as selected in your script best = which.min(cv.performance[2,])
, contains all zero or all NA values.
Once I used the .pos
files that only include genes with significant heritability to generate the score using FUSION_score_file_creator.R
, most of these empty or NA .SCORE
files were excluded from the analysis. However, there is one gene (ENSG00000120910.14) returning an empty .SCORE
file based on GTEx8 Whole_Blood SNP-weights. The wgt.matrix
of the best methods (in this case "enet") was zero for all SNPs.
However, a Prediction_failed.txt
was returned. The error reported is "Error: No valid entries in --score file."
After running the plink command directly for a few genes included in the Prediction_failed.txt
, I found that the error plink returned was --score file entries were skipped due to mismatching allele codes
. And by comparing the SNPs allele codes in the 1000G bim files with those in the RDat files, I found that the allele codes were coded differently in these files. Between 18 to 38 genes were reported in the Prediction_failed.txt
for Whole_Blood and 13 brain-related tissues from GTEx8. I don't think FUSION_ref_scorer.R
includes a step to harmonise these mismatched allele codes, but did you do anything about this in your work?
It seems to me that it would be easier to change the allele codes in the RDat files (to match the 1000G bim files) if I want to make sure no genes were to be excluded due to mismatching allele codes. But I might have to flip the sign of the SNP-weights values if I am flipping A1 and A2.
For example,
# Whole_Blood .SCORE file for ENSG00000269713.7 (only two SNPs were reported for this gene)
V1 V2 V3 V4
1: rs682536 A G -0.0288
2: rs12124527 A G -0.2501
# bim file of 1000G (I am using the EAS pop)
# a subset of SNPs available reported in the .SCORE file for ENSG00000269713.7
bim[V2%in%score3$V1,]
V1 V2 V3 V4 V5 V6
1: 1 rs682536 0 144954851 T C
2: 1 rs12124527 0 144896319 C T
Also, it is best to stay with plink/1.9 because FUSION_ref_scorer.R
take results from --score
. The output formats of --score
in plink/1.9 and plink/2 are different.
Thank you very much.
Hi Oliver,
I am trying to construct GeRS using the method you described in your paper Imputed gene expression risk scores: a functionally informed component of polygenic risk published on Human Molecular Genetics.
I obtained GTEx8 tissue-specific SNP-weights from FUSION webpage and performed tissue-specific TWAS using their R code
FUSION.assoc_test.R
.I have also used your R code
FUSION_score_file_creator.R
to convert the FUSION SNP-weights into tissue-specific SCORE files. However, I got stuck at theFUSION_ref_scorer.R
step.1) According to the example code on your GitHub page [link], I believe I have everything ready except I am not sure what the pigz binary files should be. I have the pigz directory at ./pigz-2.7/, is this what I should put after --pigz ?
2) In both
FUSION_score_file_creator.R
andFUSION_ref_scorer.R
, the below line is used to read to weight file.pos$FILE<-paste0(opt$weights_dir,'/',pos$PANEL,'/',pos$PANEL,'/',sub('.*/','',pos$WGT))
Do you intentionally include twopos$PANEL
? I have to remove one for the code to work.3) I got error messages running the below code
I got error messages like this one, and the output was only a lot of
ENSGxxx.profile_mini
files with only the gene ID in the file.cannot open file /fs1/storedir/huangjian//analysis/eQTL/GTEx8/1000G_ref_scores/ENSG00000215914.4.profile for reading (No such file or directory)
Any suggestions will be helpful. Thank you very much.