kircherlab / CADD-scripts

CADD scripts release for offline scoring. For more information about CADD, please visit our website
http://cadd.gs.washington.edu
Other
65 stars 32 forks source link

Error running CADD 1.7 #68

Closed fa2k closed 3 months ago

fa2k commented 3 months ago

I get an error message when running the pipeline. I've tried some different compute environments and input files. I have installed the annotations using install.sh, but not the prescored files.

$ ~/local/sw/CADD-scripts-1.7/CADD.sh Kapa-10ng-4-aa.scored.vcf
CADD-v1.7 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health at Charité - Universitätsmedizin Berlin 2013-2023. All rights reserved.
Running snakemake pipeline:
snakemake /tmp/tmp.IgFaey9p2E/Kapa-10ng-4-aa.scored.tsv.gz --use-conda --conda-prefix /home/fa2k/local/sw/CADD-scripts-1.7/envs/conda --cores 1
--configfile /home/fa2k/local/sw/CADD-scripts-1.7/config/config_GRCh38_v1.7_noanno.yml --snakefile /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile -q
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
WorkflowError in rule join in file /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile, line 303:
Failed to open input file: /tmp/tmp.IgFaey9p2E/Kapa-10ng-4-aa.scored.anno.novel.vcf. Has it been deleted by another process? (rule join, line 612, /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile)
fa2k commented 3 months ago

I get the same error also with prescored files installed

visze commented 3 months ago

Has your tmp directory some restrictions? Maybe try running CADD using the raw snakemake command and not the CADD.sh script. And take the warning about strict priority channel serious!

Marius Bjørnstad @.***> schrieb am Di., 11. Juni 2024, 22:04:

I get the same error also with prescored files installed

— Reply to this email directly, view it on GitHub https://github.com/kircherlab/CADD-scripts/issues/68#issuecomment-2161518368, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGWPMARLPLXXINBAP3OZVDZG5J5RAVCNFSM6AAAAABJEWPFLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGUYTQMZWHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fa2k commented 3 months ago

The /tmp is a tmpfs, but I've now tried to override it with TMPDIR to a locations on btrfs and on NFS, but the same thing happens. Also enabled strict priority and recreated the conda environment with the install script. There's just one environment, is that right?

$ ll $CADD/envs/conda/
total 4
drwxr-xr-x. 1 fa2k fa2k 148 Jun 12 22:53 caf929b5586e02a53f33fd03d2c91060_
-rw-r--r--. 1 fa2k fa2k 186 Jun 12 22:53 caf929b5586e02a53f33fd03d2c91060_.yaml

Install script output:

Setting up virtual environments for CADD v1.7
Building DAG of jobs...
Creating conda environment envs/environment_minimal.yml...
Downloading and installing remote packages.
Environment for /home/fa2k/local/sw/CADD-scripts-1.7/envs/environment_minimal.yml created (location: envs/conda/caf929b5586e02a53f33fd03d2c91060_)

I tried to run the snakemake command directly and changed the output path to the local directory instead of $TMPDIR. I seem to get the same problem, here's the detailed output (I think it may have changed a little bit when I downloaded the prescored files):

 snakemake Kapa-10ng-4-aa.scored.tsv.gz --use-conda --conda-prefix /home/fa2k/local/sw/CADD-scripts-1.7/envs/conda --cores 1 --configfile /home/fa2k/local/sw/CADD-scripts-1.7/config/config_GRCh38_v1.7.yml --snakefile /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile -p
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job         count
--------  -------
join            1
prepare         1
prescore        1
total           3

Select jobs to execute...
Execute 1 jobs...

[Wed Jun 12 22:58:02 2024]
localrule prepare:
    input: Kapa-10ng-4-aa.scored.vcf
    output: Kapa-10ng-4-aa.scored.prepared.vcf
    log: Kapa-10ng-4-aa.scored.prepare.log
    jobid: 2
    reason: Missing output files: Kapa-10ng-4-aa.scored.prepared.vcf
    wildcards: file=Kapa-10ng-4-aa.scored
    resources: tmpdir=/tmp

        cat Kapa-10ng-4-aa.scored.vcf         | python /home/fa2k/local/sw/CADD-scripts-1.7/src/scripts/VCF2vepVCF.py         | sort -k1,1 -k2,2n -k4,4 -k5,5         | uniq > Kapa-10ng-4-aa.scored.prepared.vcf 2> Kapa-10ng-4-aa.scored.prepare.log

Activating conda environment: ../../../../sw/CADD-scripts-1.7/envs/conda/caf929b5586e02a53f33fd03d2c91060_
[Wed Jun 12 22:58:23 2024]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...
Execute 1 jobs...

[Wed Jun 12 22:58:23 2024]
localcheckpoint prescore:
    input: Kapa-10ng-4-aa.scored.prepared.vcf, /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno
    output: Kapa-10ng-4-aa.scored.novel.vcf, Kapa-10ng-4-aa.scored.pre.tsv
    log: Kapa-10ng-4-aa.scored.prescore.log
    jobid: 1
    reason: Missing output files: <TBD>; Input files updated by another job: Kapa-10ng-4-aa.scored.prepared.vcf
    wildcards: file=Kapa-10ng-4-aa.scored
    resources: tmpdir=/tmp
DAG of jobs will be updated after completion.

        # Prescoring
        echo '## Prescored variant file' > Kapa-10ng-4-aa.scored.pre.tsv 2> Kapa-10ng-4-aa.scored.prescore.log;
        PRESCORED_FILES=`find -L /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno -maxdepth 1 -type f -name \*.tsv.gz | wc -l`
        if [ ${PRESCORED_FILES} -gt 0 ];
        then
            for PRESCORED in $(ls /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno/*.tsv.gz)
            do
                cat Kapa-10ng-4-aa.scored.prepared.vcf                 | python /home/fa2k/local/sw/CADD-scripts-1.7/src/scripts/extract_scored.py --header                     -p $PRESCORED --found_out=Kapa-10ng-4-aa.scored.pre.tsv.tmp                 > Kapa-10ng-4-aa.scored.prepared.vcf.tmp 2>> Kapa-10ng-4-aa.scored.prescore.log;
                cat Kapa-10ng-4-aa.scored.pre.tsv.tmp >> Kapa-10ng-4-aa.scored.pre.tsv
                mv Kapa-10ng-4-aa.scored.prepared.vcf.tmp Kapa-10ng-4-aa.scored.prepared.vcf &> Kapa-10ng-4-aa.scored.prescore.log;
            done;
            rm Kapa-10ng-4-aa.scored.pre.tsv.tmp &>> Kapa-10ng-4-aa.scored.prescore.log
        fi
        mv Kapa-10ng-4-aa.scored.prepared.vcf Kapa-10ng-4-aa.scored.novel.vcf &>> Kapa-10ng-4-aa.scored.prescore.log

Activating conda environment: ../../../../sw/CADD-scripts-1.7/envs/conda/caf929b5586e02a53f33fd03d2c91060_
[Wed Jun 12 22:58:23 2024]
Finished job 1.
2 of 3 steps (67%) done
WorkflowError in rule join in file /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile, line 303:
Failed to open input file: Kapa-10ng-4-aa.scored.anno.novel.vcf. Has it been deleted by another process? (rule join, line 612, /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile)

I'm using snakemake 8.13.0 installed using conda. What snakemake version is supported by CADD?

visze commented 3 months ago

OK. quick answer: please use snakemake 7.x . This workflow is not rady for snakemake 8. Get the same error when running snakemake 8. Also written in the README! Please read carefully (https://github.com/kircherlab/CADD-scripts/blob/master/README.md)