Closed yangyxt closed 23 hours ago
ah, I got the same error as you.
I just tried the 1.7.1 with snakemake 8.25.2 and found the same error:
I'm using conda alone for the snakemake pipeline, here are the full log content:
`Assuming unrestricted shared filesystem usage. host: paedyl01 Building DAG of jobs... Your conda installation is not configured to use strict channel priorities. This is however important for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'. Creating conda environment /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/envs/environment_minimal.yml... Downloading and installing remote packages. Environment for /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/envs/environmentminimal.yml created (location: ../../Tools/CADD/CADD-scripts-1.7.1/envs/conda/a2b5c57805b7ab088ae6802ddde5c6cf) Using shell: /usr/bin/bash Provided cores: 5 Rules claiming more threads will be scaled down. Singularity containers: ignored Job stats: job count
decompress 1 join 1 prepare 1 prescore 1 total 4
Select jobs to execute... Execute 1 jobs...
[Thu Nov 21 16:11:53 2024] localrule decompress: input: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf.gz output: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf log: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.decompress.log jobid: 3 reason: Missing output files: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf wildcards: file=/paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr resources: tmpdir=/paedyl01/disk1/yangyxt/test_tmp
zcat /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf.gz > /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf 2> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.decompress.log
Activating conda environment: ../../Tools/CADD/CADD-scripts-1.7.1/envs/conda/a2b5c57805b7ab088ae6802ddde5c6cf_ [Thu Nov 21 16:11:53 2024] Finished job 3. 1 of 4 steps (25%) done Select jobs to execute... Execute 1 jobs...
[Thu Nov 21 16:11:53 2024] localrule prepare: input: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf output: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf log: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepare.log jobid: 2 reason: Missing output files: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf; Input files updated by another job: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf wildcards: file=/paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr resources: tmpdir=/paedyl01/disk1/yangyxt/test_tmp
cat /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.vcf | python /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/src/scripts/VCF2vepVCF.py | sort -k1,1 -k2,2n -k4,4 -k5,5 | uniq > /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf 2> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepare.log
Activating conda environment: ../../Tools/CADD/CADD-scripts-1.7.1/envs/conda/a2b5c57805b7ab088ae6802ddde5c6cf_ [Thu Nov 21 16:11:54 2024] Finished job 2. 2 of 4 steps (50%) done Select jobs to execute... Execute 1 jobs...
[Thu Nov 21 16:11:54 2024]
localcheckpoint prescore:
input: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf, /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/data/prescored/GRCh37_v1.7/incl_anno
output: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.novel.vcf, /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv
log: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log
jobid: 1
reason: Missing output files:
# Prescoring
echo '## Prescored variant file' > /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv 2> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log;
PRESCORED_FILES=`find -L /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/data/prescored/GRCh37_v1.7/incl_anno -maxdepth 1 -type f -name \*.tsv.gz | wc -l`
cp /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.new
if [ ${PRESCORED_FILES} -gt 0 ];
then
for PRESCORED in $(ls /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/data/prescored/GRCh37_v1.7/incl_anno/*.tsv.gz)
do
cat /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.new | python /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/src/scripts/extract_scored.py --header -p $PRESCORED --found_out=/paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv.tmp > /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.tmp 2>> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log;
cat /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv.tmp >> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv
mv /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.tmp /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.new &> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log;
done;
rm /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.pre.tsv.tmp &>> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log
fi
mv /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prepared.vcf.new /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.novel.vcf &>> /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.prescore.log
Activating conda environment: ../../Tools/CADD/CADD-scripts-1.7.1/envs/conda/a2b5c57805b7ab088ae6802ddde5c6cf_ [Thu Nov 21 16:19:41 2024] Finished job 1. 3 of 4 steps (75%) done MissingInputException in rule annotate_esm in file /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/Snakefile, line 131: Missing input files for rule annotate_esm: output: /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.esm_missens.vcf.gz, /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.esm_frameshift.vcf.gz, /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.esm.vcf.gz wildcards: file=/paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr affected files: data/annotations/GRCh37_v1.7/esm/esm1v_t33_650M_UR90S_1.pt data/annotations/GRCh37_v1.7/esm/esm1v_t33_650M_UR90S_4.pt data/annotations/GRCh37_v1.7/esm/esm1v_t33_650M_UR90S_2.pt data/annotations/GRCh37_v1.7/esm/pep.110.fa data/annotations/GRCh37_v1.7/esm/esm1v_t33_650M_UR90S_5.pt data/annotations/GRCh37_v1.7/esm/esm1v_t33_650M_UR90S_3.pt
ERROR conda.cli.main_run:execute(125): conda run /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/CADD.sh -c 5 -a -p -m -d -g GRCh37 -o /paedyl01/disk1/yangyxt/test_acmg_auto/TEST_FAM.filtered.anno.cadd.tsv.gz /paedyl01/disk1/yangyxt/test_acmg_auto/TEST_FAM.filtered.anno.nochr.vcf.gz
failed. (See above for error)
CADD-v1.7 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health at Charite - Universitatsmedizin Berlin 2013-2024. All rights reserved.
Running snakemake pipeline:
snakemake /paedyl01/disk1/yangyxt/test_tmp/tmp.9svh5h3ptS/TEST_FAM.filtered.anno.nochr.tsv.gz --sdm conda --conda-prefix /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/envs/conda --cores 5 --configfile /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/config/config_GRCh37_v1.7.yml --snakefile /paedyl01/disk1/yangyxt/Tools/CADD/CADD-scripts-1.7.1/Snakefile -p`
ah, I got the same error as you.
Have u found a solution ? It seems annotate_vep cant be the first following the checkpoint prescore no matter how I changed the input vcf in annotate_vep rule. So frustrating.
I figured out the issue. It is not the wrong execution flow of composed DAG after the checkpoint. Instead, it is only the missing annotation resources required by annotate_esm.
In the Snakefile, the esm model's path is not absolute path, when specifying the input files, we should add os.environ["CADD"] to specify the parent directory storing all the annotation resources.
Also, the annotate_mmsplice should be made a conditional rule, which should be defined under an if condition checking whether the GenomeBuild is GRCh38.
I'll address these finding issues in a PR (https://github.com/kircherlab/CADD-scripts/pull/80#issue-2681922897) later.
Thanks! Good catch! I look at the PR shortly and since CADD-scripts v1.7.2 is very new I will retag it and do not create a new version.
can you retry with the latest master branch?
can you retry with the latest master branch?
I install it with v1.7.2.tar.gz. how to update to the latest one in git hub?
can you retry with the latest master branch?
Yes, It WORKS with latest Snakemake !!!! Thank you !!!!
The rule annotate_vep does not run before annotate_esm.
After the checkpoint prescore, the snakemake pipeline directly runs annotate_esm.
It says that inside the snakefile, the input of annotate_vep might need a more dynamic reference syntax like checkpoints.prescore.get(file=wild.file).output.novel to let snakemake identify that this rule is built upon the results from the checkpoint prescore.
Please take a look. Thanks!