opain / GenoPred

Genotype-based Prediction (GenoPred)
https://opain.github.io/GenoPred/
GNU General Public License v3.0
68 stars 23 forks source link

missing sparse matrices when running sbayesr #120

Closed davemhoward closed 2 months ago

davemhoward commented 2 months ago

Error reported on screen:

Error in rule prep_pgs_sbayesr_i: jobid: 23 input: UKB/output/reference/gwas_sumstat/BIP/BIP-cleaned.gz, resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse, resources/software/gctb/gctb_2.03beta_Linux/gctb output: UKB/output/reference/pgs_score_files/sbayesr/BIP/ref-BIP.score.gz log: UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log (check log file(s) for error details) conda-env: /scratch/prj/sgdpstrdep/clustering/GenoPred/pipeline/.snakemake/conda/f01f00993fcfcaaccd85f3c930aea943 shell: Rscript ../Scripts/pgs_methods/sbayesr.R --ref_plink_chr resources/data/ref/ref.chr --sumstats UKB/output/reference/gwas_sumstat/BIP/BIP-cleaned.gz --gctb resources/software/gctb/gctb_2.03beta_Linux/gctb --ld_matrix_chr resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr --robust T --n_cores 5 --output UKB/output/reference/pgs_score_files/sbayesr/BIP/ref-BIP --pop_data resources/data/ref/ref.pop.txt --test chr22 > UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Submitted batch job 21184115

Error executing rule prep_pgs_sbayesr_i on cluster (jobid: 23, external: Submitted batch job 21184115, jobscript: /scratch/prj/sgdp_strdep/clustering/GenoPred/pipeline/.snakemake/tmp.iphuu144/snakejob.prep_pgs_sbayesr_i.23.sh). For error details see the cluster log and the log files of the involved rule(s).

Log file UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log contains:

Loading required package: iterators Loading required package: parallel used (Mb) gc trigger (Mb) max used (Mb) Ncells 507897 27.2 1120630 59.9 661008 35.4 Vcells 939332 7.2 8388608 64.0 4676668 35.7

Error: can not open the file [resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr22.ldm.sparse.info] to read. [1] "" " GCTB 2.03 beta " " Genome-wide Complex Trait Bayesian analysis " " Author: Jian Zeng, Luke Lloyd-Jones " " MIT License " "" "" "Analysis started: Wed Sep 18 13:03:58 2024" "" "Options:" [11] "" "--sbayes R" "--ldm resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr22.ldm.sparse" "--pi 0.95,0.02,0.02,0.01" "--gamma 0.0,0.01,0.1,1" "--gwas-summary /tmp/RtmpHCDl7y/GWAS_sumstats_COJO.txt" "--chain-length 10000" "--robust " "--exclude-mhc " "--burn-in 2000" [21] "--out-freq 1000" "--out /tmp/RtmpHCDl7y/GWAS_sumstats_SBayesR.chr22" "" "" "Analysis finished: Wed Sep 18 13:03:58 2024" "Computational time: 0:0:0" Error in fread(paste0(tmp_dir, "/GWAS_sumstats_SBayesR.chr", i, ".snpRes")) : File '/tmp/RtmpHCDl7y/GWAS_sumstats_SBayesR.chr22.snpRes' does not exist or is non-readable. getwd()=='/scratch/prj/sgdp_strdep/clustering/GenoPred/pipeline' Calls: rbind -> fread -> stopf -> raise_condition -> signal Execution halted

here is the ls from GenoPred/pipeline/resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse

total 19648522 -rw-rw----+ 1 k1812035 er_prj_sgdp_strdep 3972 Jul 24 2019 README -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1786163480 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 7116575 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2845 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2850 Jul 24 2019 ukbEURu_hm3_chr12_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1484824752 Jul 24 2019 ukbEURu_hm3_chr13_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2586 Jul 24 2019 ukbEURu_hm3_chr13_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1049011376 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 4431468 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2397 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k_sparse.log -rw-rw----+ 1 k1812035 er_prj_sgdp_strdep 843776000 Sep 18 12:55 ukbEURu_hm3_chr16_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 4609366 Jul 24 2019 ukbEURu_hm3_chr16_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2435 Jul 24 2019 ukbEURu_hm3_chr16_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2309 Jul 24 2019 ukbEURu_hm3_chr17_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1019516840 Jul 24 2019 ukbEURu_hm3_chr18_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 402256320 Jul 24 2019 ukbEURu_hm3_chr19_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3153116840 Jul 24 2019 ukbEURu_hm3_chr1_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2086568 Jul 24 2019 ukbEURu_hm3_chr21_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3082969752 Jul 24 2019 ukbEURu_hm3_chr2_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 11713365 Jul 24 2019 ukbEURu_hm3_chr2_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 9949974 Jul 24 2019 ukbEURu_hm3_chr3_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3277 Jul 24 2019 ukbEURu_hm3_chr3_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2414470592 Jul 24 2019 ukbEURu_hm3_chr4_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2334240728 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 8978085 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3126 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2492051904 Jul 24 2019 ukbEURu_hm3_chr6_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 8766782 Jul 24 2019 ukbEURu_hm3_chr6_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2917 Jul 24 2019 ukbEURu_hm3_chr8_v3_50k_sparse.log

looks like some of the files required by sbayesr are missing

opain commented 2 months ago

Hi, thanks for flagging. Could post the content of the file 'GenoPred/pipeline/resources/data/logs/download_gctb_ref.log'? I guess there must have been an error when the download_gctb_ref rule was running. I will make the rule more strict to pick up the error.

In the meantime, I would recommend forcing the download_gctb_ref rule to run again, and then have a look at the log file to check it finished without error.

snakemake -j 1 -f download_gctb_ref

If this completes without error, then you can proceed with the pipeline as normal.

davemhoward commented 2 months ago

The GenoPred/pipeline/resources/data/logs/download_gctb_ref.log file is 432,570 lines in length so probably best not to post it here ;)

The first lines were: --2024-09-18 12:38:51-- https://zenodo.org/record/3350914/files/ukbEURu_hm3_sparse.zip?download=1 Resolving zenodo.org (zenodo.org)... 188.184.103.159, 188.184.98.238, 188.185.79.172, ... Connecting to zenodo.org (zenodo.org)|188.184.103.159|:443... connected. HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: /records/3350914/files/ukbEURu_hm3_sparse.zip [following] --2024-09-18 12:38:51-- https://zenodo.org/records/3350914/files/ukbEURu_hm3_sparse.zip Reusing existing connection to zenodo.org:443. HTTP request sent, awaiting response... 200 OK Length: 22145323620 (21G) [application/octet-stream] Saving to: ‘resources/data/gctb_ref/ukbEURu_hm3_sparse.zip’

The end of that file read: 21626250K .......... .......... .......... .......... .. 100% 18.7M=13m39s

2024-09-18 12:52:30 (25.8 MB/s) - ‘resources/data/gctb_ref/ukbEURu_hm3_sparse.zip’ saved [22145323620/22145323620]

Archive: resources/data/gctb_ref/ukbEURu_hm3_sparse.zip creating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr17_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/README inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr21_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr2_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr3_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr13_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr3_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr19_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr6_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr13_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr18_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr8_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr2_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr1_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr12_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr6_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr4_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k.ldm.sparse.bin

It looks as though the files downloaded okay but chr22 (and others) were still missing.

I've run the code you suggested, which also needed the --rerun-incomplete flag, and that has now successfully retrieved and unpacked all the chromosomes. So not really sure why there was an issue originally, but glad it's all working now.