Open caiodelgadonew opened 2 months ago
Hi @caiodelgadonew! I'm pretty sure this is working as intended. If the client has been configured to reserve cores so that it doesn't assign workloads to those cores, it'll need to set the cpuset for all tasks such that they fall outside the reserved cores. Otherwise the cores are not meaningfully "reserved", right?
What's the goal here with having the application set its own affinity for one thread? Is this something you can solve by giving the application a resources.cores
and then having the thread pick one of those cores?
Hi @caiodelgadonew! I'm pretty sure this is working as intended. If the client has been configured to reserve cores so that it doesn't assign workloads to those cores, it'll need to set the cpuset for all tasks such that they fall outside the reserved cores. Otherwise the cores are not meaningfully "reserved", right?
You're correct, the naming was somehow misleading to me, the reserved.cores
in the nomad agent stands for Specifies the cpuset of CPU cores to reserve
as seen here also the reserved
stands for Specifies that Nomad should reserve a portion of the node's resources from receiving tasks.
as seen here
What I've understood is that I could say to nomad "Please don't allocate anything to the core X" and then the c++ app itself could pin its specific thread to the core that was not in use by nomad. But what happened is that nomad was preventing the app to pin its thread to the specific core
What's the goal here with having the application set its own affinity for one thread? Is this something you can solve by giving the application a
resources.cores
and then having the thread pick one of those cores?
Just to clarify a bit, I work in a trading company and latency is something really important for us, so sometimes we pin a specific thread to a specific core so the core is busy only in that low latency thread.
What we did to workaround this is script a service that checks the resources.cores
and set the affinity of the tasks to the first core, and the rest to the remaining ones.
What we would like is that nomad does not schedule anything on a specific core but we could specify a thread running in a nomad task to have its affinity set to run in a specific core.
Not sure if nomad can help on anything on that case since its too specific, but I hope I was clear on my message, let me know if something was still confusing.
About the issue, I'm not sure also if it should continue or be closed.
What we did to workaround this is script a service that checks the
resources.cores
and set the affinity of the tasks to the first core, and the rest to the remaining ones.
Yeah, that sounds like the right move here. We expose NOMAD_CPU_CORES
in the task's environment for just this kind of thing.
What we would like is that nomad does not schedule anything on a specific core but we could specify a thread running in a nomad task to have its affinity set to run in a specific core.
Not sure if nomad can help on anything on that case since its too specific, but I hope I was clear on my message, let me know if something was still confusing.
Typically Nomad has avoided getting into managing what's happening inside the task boundary. That is, Nomad provides the "container" (whether a literal Linux container or otherwise) and then it's up to the application what to do inside. Managing individual thread affinities is likely out of scope for us. That being said, we've recently shipped NUMA aware scheduling in Nomad Enterprise, so there's some precedent for giving a little more control here.
I'm going to re-title this issue as a feature request and mark it for further discussion and roadmapping.
Nomad version
Operating system and Environment details
Issue
Nomad shouldn't override taskset defined inside binaries in
raw_exec
even when the reserved cores are configured in the client stanzaReproduction steps
Client configuration
In the
raw_exec
driver a C++ binary is run, this binary spawns multiple threads and one of them has the affinity defined by the binary itself using thesched_setaffinity
eg.:
But if we have this configuration on Nomad client, when we run the binary it overwrites the taskset leaving the thread to run only on non-reserved cores.
Expected Result
Nomad does not taskset on threads that has specific affinity in the binary code.
Actual Result
Nomad overwrites the affinity of the thread.
EXTRA DETAILS
I've experiment the core isolation feature over the following settings:
Client config cores = "0,2-7" & NOMAD_CPU_CORES="0-7"![image](https://github.com/hashicorp/nomad/assets/39803009/3a24a43b-88d1-4b6b-85ff-1f4bb397890a)
Client config cores = "1" & Job NOMAD_CPU_CORES="0-7" & resources.cores = 1![image](https://github.com/hashicorp/nomad/assets/39803009/db3da098-c569-42fc-b94f-e4ab08b68922)
No specific Client config & Job NOMAD_CPU_CORES="0,2-7" ( THIS makes me think the NOMAD_CPU_CORES is not working as intended)![image](https://github.com/hashicorp/nomad/assets/39803009/b19cba09-db47-471e-bf64-8b757f63f9d1)