Closed davemhoward closed 2 months ago
Hi, thanks for flagging. Could post the content of the file 'GenoPred/pipeline/resources/data/logs/download_gctb_ref.log'? I guess there must have been an error when the download_gctb_ref rule was running. I will make the rule more strict to pick up the error.
In the meantime, I would recommend forcing the download_gctb_ref rule to run again, and then have a look at the log file to check it finished without error.
snakemake -j 1 -f download_gctb_ref
If this completes without error, then you can proceed with the pipeline as normal.
The GenoPred/pipeline/resources/data/logs/download_gctb_ref.log file is 432,570 lines in length so probably best not to post it here ;)
The first lines were: --2024-09-18 12:38:51-- https://zenodo.org/record/3350914/files/ukbEURu_hm3_sparse.zip?download=1 Resolving zenodo.org (zenodo.org)... 188.184.103.159, 188.184.98.238, 188.185.79.172, ... Connecting to zenodo.org (zenodo.org)|188.184.103.159|:443... connected. HTTP request sent, awaiting response... 301 MOVED PERMANENTLY Location: /records/3350914/files/ukbEURu_hm3_sparse.zip [following] --2024-09-18 12:38:51-- https://zenodo.org/records/3350914/files/ukbEURu_hm3_sparse.zip Reusing existing connection to zenodo.org:443. HTTP request sent, awaiting response... 200 OK Length: 22145323620 (21G) [application/octet-stream] Saving to: ‘resources/data/gctb_ref/ukbEURu_hm3_sparse.zip’
The end of that file read: 21626250K .......... .......... .......... .......... .. 100% 18.7M=13m39s
2024-09-18 12:52:30 (25.8 MB/s) - ‘resources/data/gctb_ref/ukbEURu_hm3_sparse.zip’ saved [22145323620/22145323620]
Archive: resources/data/gctb_ref/ukbEURu_hm3_sparse.zip creating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr17_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/README inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr21_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr2_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr3_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr13_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr3_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr19_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr6_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr13_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr18_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr8_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr2_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr11_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr1_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr15_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr5_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k.ldm.sparse.info inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr12_v3_50k_sparse.log inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr6_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr4_v3_50k.ldm.sparse.bin inflating: resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_chr16_v3_50k.ldm.sparse.bin
It looks as though the files downloaded okay but chr22 (and others) were still missing.
I've run the code you suggested, which also needed the --rerun-incomplete flag, and that has now successfully retrieved and unpacked all the chromosomes. So not really sure why there was an issue originally, but glad it's all working now.
Error reported on screen:
Error in rule prep_pgs_sbayesr_i: jobid: 23 input: UKB/output/reference/gwas_sumstat/BIP/BIP-cleaned.gz, resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse, resources/software/gctb/gctb_2.03beta_Linux/gctb output: UKB/output/reference/pgs_score_files/sbayesr/BIP/ref-BIP.score.gz log: UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log (check log file(s) for error details) conda-env: /scratch/prj/sgdpstrdep/clustering/GenoPred/pipeline/.snakemake/conda/f01f00993fcfcaaccd85f3c930aea943 shell: Rscript ../Scripts/pgs_methods/sbayesr.R --ref_plink_chr resources/data/ref/ref.chr --sumstats UKB/output/reference/gwas_sumstat/BIP/BIP-cleaned.gz --gctb resources/software/gctb/gctb_2.03beta_Linux/gctb --ld_matrix_chr resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr --robust T --n_cores 5 --output UKB/output/reference/pgs_score_files/sbayesr/BIP/ref-BIP --pop_data resources/data/ref/ref.pop.txt --test chr22 > UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log 2>&1 (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!) cluster_jobid: Submitted batch job 21184115
Error executing rule prep_pgs_sbayesr_i on cluster (jobid: 23, external: Submitted batch job 21184115, jobscript: /scratch/prj/sgdp_strdep/clustering/GenoPred/pipeline/.snakemake/tmp.iphuu144/snakejob.prep_pgs_sbayesr_i.23.sh). For error details see the cluster log and the log files of the involved rule(s).
Log file UKB/output/reference/logs/prep_pgs_sbayesr_i-BIP.log contains:
Loading required package: iterators Loading required package: parallel used (Mb) gc trigger (Mb) max used (Mb) Ncells 507897 27.2 1120630 59.9 661008 35.4 Vcells 939332 7.2 8388608 64.0 4676668 35.7
Error: can not open the file [resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr22.ldm.sparse.info] to read. [1] "" " GCTB 2.03 beta " " Genome-wide Complex Trait Bayesian analysis " " Author: Jian Zeng, Luke Lloyd-Jones " " MIT License " "" "" "Analysis started: Wed Sep 18 13:03:58 2024" "" "Options:" [11] "" "--sbayes R" "--ldm resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse/ukbEURu_hm3_v3_50k_chr22.ldm.sparse" "--pi 0.95,0.02,0.02,0.01" "--gamma 0.0,0.01,0.1,1" "--gwas-summary /tmp/RtmpHCDl7y/GWAS_sumstats_COJO.txt" "--chain-length 10000" "--robust " "--exclude-mhc " "--burn-in 2000" [21] "--out-freq 1000" "--out /tmp/RtmpHCDl7y/GWAS_sumstats_SBayesR.chr22" "" "" "Analysis finished: Wed Sep 18 13:03:58 2024" "Computational time: 0:0:0" Error in fread(paste0(tmp_dir, "/GWAS_sumstats_SBayesR.chr", i, ".snpRes")) : File '/tmp/RtmpHCDl7y/GWAS_sumstats_SBayesR.chr22.snpRes' does not exist or is non-readable. getwd()=='/scratch/prj/sgdp_strdep/clustering/GenoPred/pipeline' Calls: rbind -> fread -> stopf -> raise_condition -> signal Execution halted
here is the ls from GenoPred/pipeline/resources/data/gctb_ref/ukbEURu_hm3_shrunk_sparse
total 19648522 -rw-rw----+ 1 k1812035 er_prj_sgdp_strdep 3972 Jul 24 2019 README -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1786163480 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 7116575 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2845 Jul 24 2019 ukbEURu_hm3_chr11_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2850 Jul 24 2019 ukbEURu_hm3_chr12_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1484824752 Jul 24 2019 ukbEURu_hm3_chr13_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2586 Jul 24 2019 ukbEURu_hm3_chr13_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1049011376 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 4431468 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2397 Jul 24 2019 ukbEURu_hm3_chr15_v3_50k_sparse.log -rw-rw----+ 1 k1812035 er_prj_sgdp_strdep 843776000 Sep 18 12:55 ukbEURu_hm3_chr16_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 4609366 Jul 24 2019 ukbEURu_hm3_chr16_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2435 Jul 24 2019 ukbEURu_hm3_chr16_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2309 Jul 24 2019 ukbEURu_hm3_chr17_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 1019516840 Jul 24 2019 ukbEURu_hm3_chr18_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 402256320 Jul 24 2019 ukbEURu_hm3_chr19_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3153116840 Jul 24 2019 ukbEURu_hm3_chr1_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2086568 Jul 24 2019 ukbEURu_hm3_chr21_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3082969752 Jul 24 2019 ukbEURu_hm3_chr2_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 11713365 Jul 24 2019 ukbEURu_hm3_chr2_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 9949974 Jul 24 2019 ukbEURu_hm3_chr3_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3277 Jul 24 2019 ukbEURu_hm3_chr3_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2414470592 Jul 24 2019 ukbEURu_hm3_chr4_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2334240728 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 8978085 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 3126 Jul 24 2019 ukbEURu_hm3_chr5_v3_50k_sparse.log -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2492051904 Jul 24 2019 ukbEURu_hm3_chr6_v3_50k.ldm.sparse.bin -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 8766782 Jul 24 2019 ukbEURu_hm3_chr6_v3_50k.ldm.sparse.info -rwxr-x---+ 1 k1812035 er_prj_sgdp_strdep 2917 Jul 24 2019 ukbEURu_hm3_chr8_v3_50k_sparse.log
looks like some of the files required by sbayesr are missing