Open bchesnut opened 5 days ago
What CADD-scripts version you are using? If 1.7 then please use only Snakemake 7.x . For CADD-scripts1.7.1. your Snakemake version should be fine.
(From your command I am pretty sure you use 1.7 and not 1.7.1. so please use the latest cadd scripts version. A simple upgrade of the repo should be fine. No other data needed).
Do you run Snakemake locally or in a cluster environment? Some environments have difficulties to run it in /tmp.
Maybe you don't use the CADD.sh script avoiding the /tmp directory.
Then you have to modify this command:
snakemake /tmp/tmp.Zmoud0EEpB/input.tsv.gz --use-conda --conda-prefix /data/analysis/src/CADD-scripts-1.7/envs/conda --cores 1 --configfile /data/analysis/src/CADD-scripts-1.7/config/config_GRCh38_v1.7.yml --snakefile /data/analysis/src/CADD-scripts-1.7/Snakefile -q
@visze Thank you for the comments. I am running CADD 1.7 and following the README.md directions per https://github.com/kircherlab/CADD-scripts, which specify using Snakemake version 8.
I am running the CADD.sh script. I tried setting TMPDIR=~/caddtmp to avoid using /tmp, but getting similar missing file error.
I tried CADD 1.7.1 with different/worse results. README.md mentions using apptainer/singularity, but no specifics.
I am very sure you use CADD-scripts v1.7.
For CADD-scripts v1.7.1 your command should look like: https://github.com/kircherlab/CADD-scripts/blob/77df69ac1e23704795d767b0c63d8955924b9838/CADD.sh#L148-L151
But It looks like v1.7: https://github.com/kircherlab/CADD-scripts/blob/203ee3bf3cc6313ebd837a750f1bb21c4c64b326/CADD.sh#L126-L127
Snakemake v1.7 requires snakemake 7.X which is mentioned in it's readme: https://github.com/kircherlab/CADD-scripts/blob/203ee3bf3cc6313ebd837a750f1bb21c4c64b326/README.md?plain=1#L56-L59
You are referring to the latest (CADD-scripts v1.7.1 release) Readme which is correct: there snakemake 8.X sould be used. Apptainer will only work with CADD-scripts v1.7.1 and it is the default in CADD.sh
. If you want to disable it use the -m
option
Can you show me the I tried CADD 1.7.1 with different/worse results.
results?
Please
Installed CADD 1.7.1 in /data/workspace/bchesnut/CADD-scripts-1.7.1
Linked /data/workspace/bchesnut/CADD-scripts-1.7.1/data to data location:
$ cd /data/workspace/bchesnut/CADD-scripts-1.7.1
$ rm -rf data
$ ln -s /dmpi/analysis/analysis_data/CADD data
Set some environment variables:
$ export TMPDIR=/data/workspace/bchesnut/tmp
$ export APPTAINER_CACHEDIR=/data/workspace/bchesnut/apptainer
Ran ./install.sh
Ran ./CADD.sh -p -a -g GRCh38 -o ./output_inclAnno_GRCh38.tsv.gz ./test/input.vcf
$ ./CADD.sh -p -a -g GRCh38 -o ./output_inclAnno_GRCh38.tsv.gz ./test/input.vcf
CADD-v1.7 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health at Charite - Universitatsmedizin Berlin 2013-2024. All rights reserved.
Running snakemake pipeline:
snakemake /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.tsv.gz --sdm conda apptainer --apptainer-prefix /data/workspace/bchesnut/CADD-scripts-1.7.1/envs/apptainer --singularity-args "--bind /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS " --conda-prefix /data/workspace/bchesnut/CADD-scripts-1.7.1/envs/conda --cores 1 --configfile /data/workspace/bchesnut/CADD-scripts-1.7.1/config/config_GRCh38_v1.7.yml --snakefile /data/workspace/bchesnut/CADD-scripts-1.7.1/Snakefile -p
Assuming unrestricted shared filesystem usage.
host: vlp-dmpianal06.dhe.duke.edu
Building DAG of jobs...
Pulling singularity image docker://visze/cadd-scripts-v1_7:0.1.0.
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
-------- -------
join 1
prepare 1
prescore 1
total 3
Select jobs to execute...
Execute 1 jobs...
[Mon Sep 16 13:38:24 2024]
localrule prepare:
input: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.vcf
output: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf
log: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepare.log
jobid: 2
reason: Missing output files: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf
wildcards: file=/data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input
resources: tmpdir=/data/workspace/bchesnut/tmp
cat /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.vcf | python /data/workspace/bchesnut/CADD-scripts-1.7.1/src/scripts/VCF2vepVCF.py | sort -k1,1 -k2,2n -k4,4 -k5,5 | uniq > /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf 2> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepare.log
Activating singularity image /data/workspace/bchesnut/CADD-scripts-1.7.1/envs/apptainer/cbbe741652f49b1cd0ee6ebf25427cc2.simg
Activating conda environment: ../../../../conda-envs/a4fcaaffb623ea8aef412c66280bd623
[Mon Sep 16 13:38:28 2024]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...
Execute 1 jobs...
[Mon Sep 16 13:38:29 2024]
localcheckpoint prescore:
input: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf, /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno
output: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.novel.vcf, /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv
log: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log
jobid: 1
reason: Missing output files: <TBD>; Input files updated by another job: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf
wildcards: file=/data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input
resources: tmpdir=/data/workspace/bchesnut/tmp
DAG of jobs will be updated after completion.
# Prescoring
echo '## Prescored variant file' > /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv 2> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
PRESCORED_FILES=`find -L /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno -maxdepth 1 -type f -name \*.tsv.gz | wc -l`
cp /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new
if [ ${PRESCORED_FILES} -gt 0 ];
then
for PRESCORED in $(ls /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno/*.tsv.gz)
do
cat /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new | python /data/workspace/bchesnut/CADD-scripts-1.7.1/src/scripts/extract_scored.py --header -p $PRESCORED --found_out=/data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp > /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.tmp 2>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
cat /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp >> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv
mv /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.tmp /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new &> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
done;
rm /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp &>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log
fi
mv /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.novel.vcf &>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log
Activating singularity image /data/workspace/bchesnut/CADD-scripts-1.7.1/envs/apptainer/cbbe741652f49b1cd0ee6ebf25427cc2.simg
Activating conda environment: ../../../../conda-envs/a4fcaaffb623ea8aef412c66280bd623
find: ‘/data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno’: No such file or directory
[Mon Sep 16 13:38:29 2024]
Error in rule prescore:
jobid: 1
input: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf, /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno
output: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.novel.vcf, /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv
log: /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log (check log file(s) for error details)
conda-env: /conda-envs/a4fcaaffb623ea8aef412c66280bd623
shell:
# Prescoring
echo '## Prescored variant file' > /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv 2> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
PRESCORED_FILES=`find -L /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno -maxdepth 1 -type f -name \*.tsv.gz | wc -l`
cp /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new
if [ ${PRESCORED_FILES} -gt 0 ];
then
for PRESCORED in $(ls /data/workspace/bchesnut/CADD-scripts-1.7.1/data/prescored/GRCh38_v1.7/incl_anno/*.tsv.gz)
do
cat /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new | python /data/workspace/bchesnut/CADD-scripts-1.7.1/src/scripts/extract_scored.py --header -p $PRESCORED --found_out=/data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp > /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.tmp 2>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
cat /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp >> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv
mv /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.tmp /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new &> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log;
done;
rm /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv.tmp &>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log
fi
mv /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prepared.vcf.new /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.novel.vcf &>> /data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.prescore.log
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Removing output files of failed job prescore since they might be corrupted:
/data/workspace/bchesnut/tmp/tmp.ZwC6gGaiGS/input.pre.tsv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-09-16T132855.935867.snakemake.log
WorkflowError:
At least one job did not complete successfully.
Ok. Two things I see. First can you bgzip your input file to input.vcf.gz. but not sure if it change anything.
Second which can be a trouble maker too: Paths have to be correctly set for apptainer images (apptainer command --bind). You have to bind a lot of them extra. E.g. tmp folder,.... Otherwise tmp of singularity image is used and then wiped and next time loaded not there anymore.
I tested it on my end..it worked but you never know on other systems...
So maybe first recommendation is to use only mamba first (-m) flag in the CADD.sh script?
I am getting the following error while running the test script for CADD 1.7:
Verbose output using -p is attached. cadd-output.txt
I'm running Red Hat EL9 and Miniforge3 conda with snakemake 8.20.3
Thank you in advance for suggestions.