moiexpositoalonsolab / grenepipe

A flexible, scalable, and reproducible pipeline to automate variant calling from raw sequence reads, with lots of bells and whistles.
http://grene-net.org
GNU General Public License v3.0
93 stars 21 forks source link

Workflow fails with contig-group-size != 0 #52

Open lczech opened 2 months ago

lczech commented 2 months ago

Issue by @meixilin, moved here from https://github.com/moiexpositoalonsolab/grenepipe/issues/36#issuecomment-2270384816

Hi,

I have an issue that might be relevant to this thread. I have noticed that when I set contig-group-size: 0, the pipeline runs fine (with a lot of jobs though given the amount of contigs).

job                                      count
-------------------------------------  -------
all                                          1
all_bams                                     1
all_pileups                                  1
bam_index                                    1
call_variants                              280
combine_calls                              280
dedup_reports_collect                        1
gatk_hard_filter_calls                       2
genotype_variants                          280
merge_calls                                  1
merge_variants                               1
mpileup_all_samples                          1
multiqc                                      1
picard_collectmultiplemetrics                1
picard_collectmultiplemetrics_collect        1
qualimap_collect                             1
qualimap_sample                              1
samtools_flagstat                            1
samtools_flagstat_collect                    1
samtools_stats                               1
samtools_stats_collect                       1
select_calls                                 2
total                                      861

However, when I set a contig-group-size: 34000000, the calling/contig-groups folders get created just fine, but there are no more call_variants jobs created, in addition, the pipeline fails at the bam_indexjob this time.

job                                      count
-------------------------------------  -------
all                                          1
all_bams                                     1
all_pileups                                  1
contig_groups                                1
dedup_reports_collect                        1
fastqc                                       2
fastqc_collect                               1
gatk_hard_filter_calls                       2
map_reads                                    1
mark_duplicates                              1
merge_calls                                  1
merge_sample_unit_bams                       1
merge_variants                               1
mpileup_all_sample_names                     1
mpileup_all_samples                          1
multiqc                                      1
picard_collectmultiplemetrics                1
picard_collectmultiplemetrics_collect        1
qualimap_collect                             1
qualimap_sample                              1
samtools_flagstat                            1
samtools_flagstat_collect                    1
samtools_stats                               1
samtools_stats_collect                       1
select_calls                                 2
trim_reads_pe                                1
trimming_reports_collect                     1
trimmomatic_multiqc_log                      1
total                                       31

The error message is:

Full Traceback (most recent call last):
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/cli.py", line 2103, in args_to_api
    dag_api.execute_workflow(
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/api.py", line 594, in execute_workflow
    workflow.execute(
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/workflow.py", line 1248, in execute
    raise e
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/workflow.py", line 1244, in execute
    success = self.scheduler.schedule()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/scheduler.py", line 198, in schedule
    self._finish_jobs()
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/scheduler.py", line 388, in _finish_jobs
    async_run(postprocess())
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/common/__init__.py", line 94, in async_run
    return asyncio.run(coroutine)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/asyncio/base_events.py", line 664, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/scheduler.py", line 383, in postprocess
    await self.workflow.dag.finish(
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/dag.py", line 1879, in finish
    potential_new_ready_jobs = self.update_ready(depending)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "micromamba/envs/grenepipe/lib/python3.12/site-packages/snakemake/dag.py", line 1578, in update_ready
    group = self._group[job]
            ~~~~~~~~~~~^^^^^
KeyError: bam_index

Any thoughts would be greatly appreciated!

Originally posted by @meixilin in https://github.com/moiexpositoalonsolab/grenepipe/issues/36#issuecomment-2270384816

lczech commented 2 months ago

Hi @meixilin,

as you originally posted this in #36, I assume you are also using the restrict-regions option? If so, then combining this with the contig grouping option (contig-group-size != 0) is not implemented, as that is rather difficult to do.

If this is not about restrict-regions, then it could be a bug in grenepipe, or in snakemake... hard to tell from the error log you provided. Can you please share your complete config.yaml? Also, if possible, the best way to track this down would be some form of minimal example that I can run to produce this behavior! Right now, from the distance, it is hard to see what is going on there.

Cheers and so long Lucas