opain / eQTL_to_TWAS

Methods for converting eQTL summary statistics into TWAS SNP-weights
GNU General Public License v3.0
11 stars 4 forks source link

unexpected error in mBAT-combo #13

Closed 1667857557 closed 12 months ago

1667857557 commented 12 months ago

Hi Oliver, We encountered a new unexpected error while computing the weights for one gene. The error seems to occur during the execution of the 'mBAT-combo' mode. Below is the error report and the relevant file. Could you please provide some suggestions on how to address it? Thanks in advance!

Reading GWAS summary-level statistics from [./eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/mBAT-combo/ss.txt] ... GWAS summary statistics of 58 SNPs read from [./eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/mBAT-combo/ss.txt]. Phenotypic variance estimated from summary statistics of all 58 SNPs: 0.459756 (variance of logit for case-control studies). Matching the GWAS meta-analysis results to the genotype data ... Calculating allele frequencies ... Warning: can't match the reference alleles of 58 SNPs to those in the genotype data. These SNPs have been saved in [./eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/mBAT-combo/res.badsnps]. Error: none of the SNPs in the GWAS summary data can be found in the genotype data. An error occurs, please check the options or data Loading required package: bigstatsr Segmentation fault Error in fread(paste0(opt$output, "/", gene_i, "/mBAT-combo/res.gene.assoc.mbat"), : File './eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/mBAT-combo/res.gene.assoc.mbat' does not exist or is non-readable. getwd()=='/mnt/g/linux' In addition: Warning messages: 1: In system2(command = opt$plink, args = paste0("--bfile ", opt$plink_ref_chr, : setting stdout = TRUE 2: In system(paste0(opt$gctb, " --sbayes R --ldm ", opt$gctb_ref, chr_i, : running command './gctb/gctb --sbayes R --ldm ./gctb/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr1.ldm.sparse --pi 0.95,0.02,0.02,0.01 --gamma 0.0,0.01,0.1,1 --gwas-summary ./eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/SBayesR/ENSG00000117281.txt --chain-length 10000 --exclude-mhc --burn-in 2000 --out-freq 1000 --out ./eQTL_to_TWAS/eQTL_to_TWAS/ENSG00000117281.txt/ENSG00000117281/SBayesR/ENSG00000117281.SBayesR' had status 139 Execution halted ENSG00000117281.txt

opain commented 12 months ago

Hello Huang,

Thank you for reporting the error. This process is ironing out many problems that I did encounter but others likely will. Apologies for the trouble it might be causing you.

I have identified the issue and pushed an update to the main branch (https://github.com/opain/eQTL_to_TWAS/commit/a55f3510a88e083010304e830c57ee667bb210d0)

The problem was caused by GCTA not automatically resolving strand flips between the sumstats and the reference, so it found no overlap. Honestly, I am suprised it doesn't handle these automatically as most software does. I have now included a step that harmonises the allele codes between the sumstats and the reference.

I would recommend running your analyses again for all genes. Apologies for the inconvenience.

Let me know if you have any other issues and I will try to resolve them quickly.

Best wishes,

Ollie

1667857557 commented 12 months ago

Hi Oliver,

Thank you for your prompt response. I have another question: do you think that splitting the eQTL summary dataset by genes and utilizing the "parallel" command in Ubuntu could help enhance the processing speed? How does this compare to using "sbatch" commands? Unfortunately, due to device limitations, I am unable to employ the SLURM job scheduler.

`_cd /mnt/g/linux find ./eQTL_to_TWAS/ -name '.txt' -exec basename {} \; > file_list.txt find /mnt/g/linux/eQTL_to_TWAS/eQTL_to_TWAS/ -name '.RDat' -exec basename {} \; | sed 's/.RDat$//' > RDat_list.txt grep -vFf RDat_list.txt file_list.txt > result.txt

process_line() { line="$1" Rscript ./compute_weights.R \ --extract ./ldsc/w_hm3.snplist \ --sumstats ./eQTL_to_TWAS/"$line" \ --gcta ./gcta/gcta-1.94.1 \ --gctb ./gctb/gctb \ --gctb_ref ./gctb/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr \ --plink_ref_chr ./Trans-Phar/1000G_EUR_Phase3_plink/1000G.EUR.QC. \ --plink_ref_keep ./EUR_1KG_phase3_samples.tsv \ --ld_blocks ./ldetect/EUR \ --rscript Rscript \ --dbslmm ./DBSLMM/software \ --plink ./plink/plink \ --PRScs_path ./PRScs/PRScs.py \ --PRScs_ref_path ./PRScs/ldblk_ukbb_eur \ --ldpred2_ref_dir ./ldref \ --output ./eQTL_to_TWAS/eQTL_to_TWAS/"$line" }

export -f process_line

cat result.txt | parallel -j 15 processline`

Best wishes, Huang

opain commented 12 months ago

Hi Huang,

I am not familiar with the parallel function, but I think the setup you propose looks like a good idea. It is similar to the way I use sbatch here. You could split your sumstats by gene in advance (as you suggest), or use the --id parameter to select sumstats for one gene at a time (as I do).

Ollie

1667857557 commented 12 months ago

I got it, Thank you very much.