nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.72k stars 621 forks source link

Nextflow hanging with still processes to run #3385

Open diego-rt opened 1 year ago

diego-rt commented 1 year ago

Expected behavior and actual behavior

While running a pretty large pipeline that took around a week to run, it completed all jobs for a process but is not proceeding to the following step. A previous run of this pipeline (from which this one was a resume with additional data) proceeded to the next process fine. Any advice on how could I force the pipeline to proceed would be greatly appreciated, since because of the ignored failed jobs I would rather not restart it...

Monitor the execution with Nextflow Tower using this url https://tower.nf/user/diego-terrones/watch/4hVwtBtZWKV1Rc
executor >  slurm (170139)
[18/69da11] process > genome_sizes (1)          [100%] 1 of 1, cached: 1 ✔
[ba/a2430b] process > selectGenes               [100%] 1 of 1, cached: 1 ✔
[bd/ff555f] process > decompressionAndIndex (3) [100%] 7 of 7, cached: 7 ✔
[c3/64d9fd] process > parseAlignmentPairs (3)   [100%] 6 of 6, cached: 6 ✔
[0f/bf8bf2] process > pairFiltering (5)         [100%] 6 of 6, cached: 6 ✔
[ab/779335] process > wfa_alignment (199630)    [100%] 298133 of 298133, cached: 127993, failed: 84869, retries: 58133
[-        ] process > bam_formatting            -
[-        ] process > findOverlaps              -
[-        ] process > parseCIGARs               -
[-        ] process > merge_results             -

Program output

The end of the log reads the following for over 12 hours:

[diego.terrones@clip-login-0 3_testScoringSchemes]$ tail -n 50 .nextflow.log
  port 2: (cntrl) -     ; channel: $

Nov-14 11:51:04.882 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] bam_formatting
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (value) bound ; channel: genome_index
  port 2: (cntrl) -     ; channel: $

[process] findOverlaps
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (value) bound ; channel: selected_genes.txt
  port 2: (cntrl) -     ; channel: $

[process] parseCIGARs
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] merge_results
  status=ACTIVE
  port 0: (value) OPEN  ; channel: results_table_*.txt
  port 1: (value) OPEN  ; channel: non_matched_*.txt
  port 2: (cntrl) -     ; channel: $

Nov-14 11:56:04.893 [Task monitor] DEBUG n.processor.TaskPollingMonitor - No more task to compute -- The following nodes are still active:
[process] bam_formatting
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (value) bound ; channel: genome_index
  port 2: (cntrl) -     ; channel: $

[process] findOverlaps
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (value) bound ; channel: selected_genes.txt
  port 2: (cntrl) -     ; channel: $

[process] parseCIGARs
  status=ACTIVE
  port 0: (queue) OPEN  ; channel: -
  port 1: (cntrl) -     ; channel: $

[process] merge_results
  status=ACTIVE
  port 0: (value) OPEN  ; channel: results_table_*.txt
  port 1: (value) OPEN  ; channel: non_matched_*.txt
  port 2: (cntrl) -     ; channel: $

Environment

Additional context

(Add any other context about the problem here)

bentsherman commented 1 year ago

Possibly related to #2693

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AnushreeDhar commented 8 months ago

Hi did you find a solution around this?

HenriettaHolze commented 4 months ago

@bentsherman @diego-rt Did you find a solution for this? Can this issue be reopened?