nextflow-io / nf-nomad

Hashicorp Nomad executor plugin for Nextflow
https://nextflow-io.github.io/nf-nomad/
Apache License 2.0
2 stars 4 forks source link

Allow 1 restart per task #82

Closed matthdsm closed 2 months ago

matthdsm commented 2 months ago

This should fix a concurrency issue with the CSI driver cfr https://github.com/ceph/ceph-csi/issues/3511 and https://github.com/hashicorp/nomad/issues/15197

abhi18av commented 2 months ago

@matthdsm , would this trigger a nomad-native retry or is this expecting Nextflow to trigger this?

As of now, we have a model of 1 nextflow task -> 1 nomad job -> group -> task and therefore, error at nomad job level reflects error at nextflow level and the overall behaviour aligns with what Nextflow would expect.

How nomad-native retries would interact with the overall setup needs to be tested.

jagedn commented 2 months ago

If I'm not wrong, before to have this constants we're using the default value (3) and Nextflow wait correctly for the completion of a failed job, so maybe can be a good idea to use 1 as default

abhi18av commented 2 months ago

If I'm not wrong, before to have this constants we're using the default value (3) and Nextflow wait correctly for the completion of a failed job, so maybe can be a good idea to use 1 as default

It seems that 3 is the default for these attempts https://github.com/nextflow-io/nf-nomad/pull/82#issuecomment-2315371597

@jagedn , do you think its worth exposing the closure and including other fields like delay , interval etc?

abhi18av commented 2 months ago

Okay, merging this and tagging this realease as 0.2.0-edge3 to release a build

jagedn commented 2 months ago

@jagedn , do you think its worth exposing the closure and including other fields like delay , interval etc?

maybe we can implement all of them when required