Closed fa2k closed 3 months ago
I get the same error also with prescored files installed
Has your tmp directory some restrictions? Maybe try running CADD using the raw snakemake command and not the CADD.sh script. And take the warning about strict priority channel serious!
Marius Bjørnstad @.***> schrieb am Di., 11. Juni 2024, 22:04:
I get the same error also with prescored files installed
— Reply to this email directly, view it on GitHub https://github.com/kircherlab/CADD-scripts/issues/68#issuecomment-2161518368, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGWPMARLPLXXINBAP3OZVDZG5J5RAVCNFSM6AAAAABJEWPFLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRRGUYTQMZWHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
The /tmp is a tmpfs, but I've now tried to override it with TMPDIR to a locations on btrfs and on NFS, but the same thing happens. Also enabled strict priority and recreated the conda environment with the install script. There's just one environment, is that right?
$ ll $CADD/envs/conda/
total 4
drwxr-xr-x. 1 fa2k fa2k 148 Jun 12 22:53 caf929b5586e02a53f33fd03d2c91060_
-rw-r--r--. 1 fa2k fa2k 186 Jun 12 22:53 caf929b5586e02a53f33fd03d2c91060_.yaml
Install script output:
Setting up virtual environments for CADD v1.7
Building DAG of jobs...
Creating conda environment envs/environment_minimal.yml...
Downloading and installing remote packages.
Environment for /home/fa2k/local/sw/CADD-scripts-1.7/envs/environment_minimal.yml created (location: envs/conda/caf929b5586e02a53f33fd03d2c91060_)
I tried to run the snakemake command directly and changed the output path to the local directory instead of $TMPDIR. I seem to get the same problem, here's the detailed output (I think it may have changed a little bit when I downloaded the prescored files):
snakemake Kapa-10ng-4-aa.scored.tsv.gz --use-conda --conda-prefix /home/fa2k/local/sw/CADD-scripts-1.7/envs/conda --cores 1 --configfile /home/fa2k/local/sw/CADD-scripts-1.7/config/config_GRCh38_v1.7.yml --snakefile /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile -p
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
-------- -------
join 1
prepare 1
prescore 1
total 3
Select jobs to execute...
Execute 1 jobs...
[Wed Jun 12 22:58:02 2024]
localrule prepare:
input: Kapa-10ng-4-aa.scored.vcf
output: Kapa-10ng-4-aa.scored.prepared.vcf
log: Kapa-10ng-4-aa.scored.prepare.log
jobid: 2
reason: Missing output files: Kapa-10ng-4-aa.scored.prepared.vcf
wildcards: file=Kapa-10ng-4-aa.scored
resources: tmpdir=/tmp
cat Kapa-10ng-4-aa.scored.vcf | python /home/fa2k/local/sw/CADD-scripts-1.7/src/scripts/VCF2vepVCF.py | sort -k1,1 -k2,2n -k4,4 -k5,5 | uniq > Kapa-10ng-4-aa.scored.prepared.vcf 2> Kapa-10ng-4-aa.scored.prepare.log
Activating conda environment: ../../../../sw/CADD-scripts-1.7/envs/conda/caf929b5586e02a53f33fd03d2c91060_
[Wed Jun 12 22:58:23 2024]
Finished job 2.
1 of 3 steps (33%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Jun 12 22:58:23 2024]
localcheckpoint prescore:
input: Kapa-10ng-4-aa.scored.prepared.vcf, /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno
output: Kapa-10ng-4-aa.scored.novel.vcf, Kapa-10ng-4-aa.scored.pre.tsv
log: Kapa-10ng-4-aa.scored.prescore.log
jobid: 1
reason: Missing output files: <TBD>; Input files updated by another job: Kapa-10ng-4-aa.scored.prepared.vcf
wildcards: file=Kapa-10ng-4-aa.scored
resources: tmpdir=/tmp
DAG of jobs will be updated after completion.
# Prescoring
echo '## Prescored variant file' > Kapa-10ng-4-aa.scored.pre.tsv 2> Kapa-10ng-4-aa.scored.prescore.log;
PRESCORED_FILES=`find -L /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno -maxdepth 1 -type f -name \*.tsv.gz | wc -l`
if [ ${PRESCORED_FILES} -gt 0 ];
then
for PRESCORED in $(ls /home/fa2k/local/sw/CADD-scripts-1.7/data/prescored/GRCh38_v1.7/incl_anno/*.tsv.gz)
do
cat Kapa-10ng-4-aa.scored.prepared.vcf | python /home/fa2k/local/sw/CADD-scripts-1.7/src/scripts/extract_scored.py --header -p $PRESCORED --found_out=Kapa-10ng-4-aa.scored.pre.tsv.tmp > Kapa-10ng-4-aa.scored.prepared.vcf.tmp 2>> Kapa-10ng-4-aa.scored.prescore.log;
cat Kapa-10ng-4-aa.scored.pre.tsv.tmp >> Kapa-10ng-4-aa.scored.pre.tsv
mv Kapa-10ng-4-aa.scored.prepared.vcf.tmp Kapa-10ng-4-aa.scored.prepared.vcf &> Kapa-10ng-4-aa.scored.prescore.log;
done;
rm Kapa-10ng-4-aa.scored.pre.tsv.tmp &>> Kapa-10ng-4-aa.scored.prescore.log
fi
mv Kapa-10ng-4-aa.scored.prepared.vcf Kapa-10ng-4-aa.scored.novel.vcf &>> Kapa-10ng-4-aa.scored.prescore.log
Activating conda environment: ../../../../sw/CADD-scripts-1.7/envs/conda/caf929b5586e02a53f33fd03d2c91060_
[Wed Jun 12 22:58:23 2024]
Finished job 1.
2 of 3 steps (67%) done
WorkflowError in rule join in file /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile, line 303:
Failed to open input file: Kapa-10ng-4-aa.scored.anno.novel.vcf. Has it been deleted by another process? (rule join, line 612, /home/fa2k/local/sw/CADD-scripts-1.7/Snakefile)
I'm using snakemake 8.13.0 installed using conda. What snakemake version is supported by CADD?
OK. quick answer: please use snakemake 7.x . This workflow is not rady for snakemake 8. Get the same error when running snakemake 8. Also written in the README! Please read carefully (https://github.com/kircherlab/CADD-scripts/blob/master/README.md)
I get an error message when running the pipeline. I've tried some different compute environments and input files. I have installed the annotations using install.sh, but not the prescored files.