nextflow-io / nf-nomad

Hashicorp Nomad executor plugin for Nextflow
https://nextflow-io.github.io/nf-nomad/
Apache License 2.0
2 stars 4 forks source link

Feature request: Job priority #93

Open matthdsm opened 1 week ago

matthdsm commented 1 week ago

Hi guys,

Would it be feasible to implement an option for job priority? We've got a workload on our cluster that has to take priority over other batch jobs.

I'm thinking an extra option for the process configuration would be nice?

Cheers Matthias

matthdsm commented 1 week ago

https://developer.hashicorp.com/nomad/docs/job-specification/job#priority

abhi18av commented 1 week ago

Good point @matthdsm, I see that there is some correlation between the Nomad server side setting https://developer.hashicorp.com/nomad/docs/configuration/server#job_max_priority

What's the limit you guys are using on the cluster?

matthdsm commented 1 week ago

@tomiles? Any idea?

tomiles commented 1 week ago

For now it's just default = 100. The docs say you can set max priority at most to 32766, so makes sense if we implement this we support at most the same max value.

abhi18av commented 1 week ago

Agreed @tomiles , we'd just need to find a way to inspect this value from a Nomad job, otherwise the mismatch between the job submission priority and the server level setting could lead to underfined behavior.

@matthdsm , could you please do a quick check for this behavior using vanilla nomad job with out-of-bound priority?

tomiles commented 1 week ago

I think the API should give you an exemption when you try to run the job with invalid (out of bounds) priority. So we probably don't need to go and replicate checks nomad server does for you, and just properly catch API errors.

jagedn commented 1 week ago

@abhi18av are you working in this or do you want I take a look?

abhi18av commented 1 week ago

Hi dear @jagedn and team 👋

Apologies for the radio silence but I'm afraid, I'll only be able to (consistently) resume my tasks once I'm back home in SA. If anyone can take this forward that'd be great! 🙏

At the moment, I'm in Brazil for organizing training events in different sites. Ideally we will start deploying the plugin and site-specific Nomad clusters in these locations where the trainings are being done as of now.

jagedn commented 1 week ago

I founded this endpoint interesting: /v1/job/:job_id/plan

we can run a dummy plan at startup for a job with a very high priority and parse the error from the server For example using the UI, in our local cluster , I can plan a job with priority = 65535 and the server returns 1 error occurred: * job priority must be between [1, 100]

... but at this point I don't know what's your idea. I mean, do we need to validate the priority of a process before to submit and abort the pipeline ? if exceeded do we need to change it ? ....

jagedn commented 1 week ago

meantime I'll implement the priority annotation

jagedn commented 1 week ago

@abhi18av You have a great challenge ahead of you:

imagen

jagedn commented 1 week ago

@matthdsm I just uploaded 0.3.2-edge2 with a new optional priority directive, in case you want to try it

In case you exceed the max priority configured in the server the job will fail (I think)

https://github.com/nextflow-io/nf-nomad/releases/tag/0.3.2-edge2