nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.71k stars 621 forks source link

Delay in recording a task as succeeded #5143

Closed siddharthab closed 2 months ago

siddharthab commented 2 months ago

Thank you so much for such a mature product.

We have been using Nextflow recently on GCP and getting more familiar with it.

On a nf-core/cutandrun pipeline run that we did on a GCP Cloud Workstation, we noticed that there were many delays after the Google Cloud Batch job succeeded and before the task was registered as succeeded by Nextflow. For example, the below two consecutive lines from the log:

Jul-12 08:38:08.229 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_PEAKCALLING:UCSC_BEDCLIP (IgG_R2)` - terminated job=nf-c7227eb0-1720772308641; task=0; state=SUCCEEDED
Jul-12 09:06:57.188 [Task monitor] DEBUG n.c.g.batch.GoogleBatchTaskHandler - [GOOGLE BATCH] Process `NFCORE_CUTANDRUN:CUTANDRUN:PREPARE_PEAKCALLING:UCSC_BEDCLIP (IgG_R2)` - last event: description: "Task state is updated from RUNNING to SUCCEEDED on zones/us-central1-c/instances/6311147577739662809"

Events like this resulted in our pipeline taking ~13 hours. When I ran the same pipeline on a regular GCE VM instead of a Cloud Workstation, the same pipeline took 4 hours.

On a first glance, monitoring logs show that CPU and memory utilization was fairly low on the machine during this time and no one was using the machine at all.

What could be a reason this could be happening on our run? Just looking for hints so I can debug this further on my own.