When using CADD, the number of VEP processes seems to be the square of the number of scatters of the genome. I expect it should instead be equal to the number of scatters, which is the number of CADD jobs.
(Not related to the problem: In the example below I had to manually lower the scatter count to 2 to get it to run on a small test dataset)
Command used and terminal output
[6a/7d8489] process > NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:ANNOTATE_CADD:BCFTOOLS_ANNOTATE (NA12878) [100%] 2 of 2 ✔
[a7/c53176] process > NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:ANNOTATE_CADD:TABIX_ANNOTATE (NA12878) [100%] 2 of 2 ✔
[ce/8c77e1] process > NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:ENSEMBLVEP_SNV (NA12878) [100%] 4 of 4 ✔
[30/b53874] process > NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:TABIX_VEP (NA12878) [100%] 4 of 4 ✔
[- ] process > NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:BCFTOOLS_CONCAT -
[---]
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/raredisease] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:BCFTOOLS_CONCAT (1)'
Caused by:
Process `NFCORE_RAREDISEASE:RAREDISEASE:ANNOTATE_SNVS:BCFTOOLS_CONCAT` input file name collision -- There are multiple input files for each of the following file names: NA12878_0001-scattered_ann_rohann_vcfanno_filter_vep.vcf.gz.tbi, NA12878_0000-scattered_ann_rohann_vcfanno_filter_vep.vcf.gz, NA12878_0001-scattered_ann_rohann_vcfanno_filter_vep.vcf.gz, NA12878_0000-scattered_ann_rohann_vcfanno_filter_vep.vcf.gz.tbi
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
-- Check '.nextflow.log' file for details
Description of the bug
When using CADD, the number of VEP processes seems to be the square of the number of scatters of the genome. I expect it should instead be equal to the number of scatters, which is the number of CADD jobs.
(Not related to the problem: In the example below I had to manually lower the scatter count to 2 to get it to run on a small test dataset)
Command used and terminal output
Relevant files
No response
System information
nextflow 23.04.2 pipeline 1.1.1