Provide a way to tie namespaces to certain client nodes

apollo13 commented 3 years ago

It would be great if it were possible to tie namespaces to certain client nodes. A common example for that would be to have /dev/staging/production namespaces which should be served by different client nodes.

A few options come to mind:

Allow admins to specify allowed namespaces in the client stanza (https://www.nomadproject.io/docs/configuration/client#client-parameters). This could look like this:
```
client {
allowed_namespaces = ["default", "dev-*"] # Allow globs for more flexibility?
}
```
with a default of ["*"] essentially allowing all namespaces to be scheduled on the node.
Specify a set of constraints during namespace creation. This would be done via an extra field during namespace creation/update:
```
{
"Name": "api-prod",
"Description": "Production API Servers",
"Quota": "prod-quota",
"Constraints": [
 {attribute: "${attr.kernel.name}", value: "linux"}
 …
]
}
```
The constraints are the same as described in https://www.nomadproject.io/docs/job-specification/constraint . Now when a job is scheduled in a namespace, the namespace constraints are applied to limit scheduling decissions (in addition to the constraints specified in the job file).

the-maldridge commented 3 years ago

I can also see a use case for this where in a physical hardware environment or even a some cloud environments the cluster is "owned" by a central infrastructure team and is "billed" back to other teams, but individual consumers are able to "buy" resources for exclusive use. In that case being able to mark a resource as exclusively belonging to a namespace seems like a good way to make sure that everyone stays happy.

tgross commented 3 years ago

Hi @apollo13 and @the-maldridge!

Some historical context: I suspect the reason this wasn't the case originally was because namespaces were Enterprise-only, so the expected workflow for those kinds of operations would probably be using Sentinel policies (although I'll admit how you would do that is not super well-documented so I'm not 100% sure on that). Now that namespaces are moving into OSS (for 1.0), this sort of thing becomes an interesting idea to discuss.

acornies commented 3 years ago

I'm in definite need of this feature as an enterprise customer 😄 . If there is a way to manage this through Sentinel, it would be great if more docs could be created around that.

tgross commented 3 years ago

@acornies you might be able to adapt the one in this guide: https://github.com/hashicorp/nomad-guides/blob/master/operations/sentinel/sentinel_policies/restrict_namespace_to_dc.sentinel

apollo13 commented 3 years ago

I have added a draft PR where I show how the feature could look like. I did not actually run it yet, but I did write tests. I will manually test over the next days.

@tgross Do you think this PR is viable or would I have to go another direction? If you think this is the right direction I can see about adding docs etc…

@the-maldridge Would appreciate if you could test this :)

henrikjohansen commented 3 years ago

Even as an Enterprise customer I would love to have this available as it indeed would make certain aspects of our cluster management much, much easier :+1:

the-maldridge commented 3 years ago

@tgross just to capture a point that's been discussed in gitter here: unless the docs are wrong/incomplete, sentinel policies are evaluated during job submission for policy decisions, but don't affect the scheduler, and are not reevaluated when nodes join or leave the fleet. If this is the case then I don't think there's even an ugly hack for this with sentinel, but I'd love to hear from a sentinel expert (with example!) of how this might be achieved.

henrikjohansen commented 3 years ago

@the-maldridge We have been using something similar to this sentinel policy but it get's unwieldy rather quickly ...

Essentially the policy enforces that :

every job must have a constraint on ${node.class} or submission/update is denied.
every namespace must exist in the policy and it must define which ${node.class} constraints are allowed for any given namespace.
this needs to be checked against every stanza where constraints are available (job {}, group {} & task {}) .

To make life a bit easier we have recently added that :

all system jobs are in a special namespace which has access to all ${node.class}

This way we can grow / shrink the number or type of nodes per ${node.class} while enforcing that certain namespaces have exclusive access to certain nodes or resources.

the-maldridge commented 3 years ago

Interesting. I can see how this gets unwieldy with a large number of jobs and namespaces.

henrikjohansen commented 3 years ago

@the-maldridge It's bordering the unbearable but it's the only solution for now. It seems that Sentinel is the recommended escape hatch one needs to resort to when features are missing ...

the-maldridge commented 3 years ago

@apollo13 I finally had time to prepare a build and try this in my local cluster. Thanks so much for taking the time to prepare the draft PR as it made it fairly quick to test.

My observations are that it more or less works as one would expect with the crucial point that the error message returned when no nodes are available to service a namespace is not particularly useful:

$ nomad job plan hello.nomad 
+ Job: "hello"
+ Task Group: "app" (1 create)
  + Task: "app" (forces create)

Scheduler dry-run:
- WARNING: Failed to place all allocations.
  Task Group "app" (failed to place 1 allocation):
    * Constraint "missing compatible host volumes": 1 nodes excluded by filter

I think though this happens with other checks, so I'm not sure about the practicality of getting it to explain why the nodes were filtered. In my testing I created two namespaces and added some hosts to each and some hosts to both. Everything worked mostly as I expected, is there anything in particular that I should test to ensure I've hit the points you are interested in? This works cleanly for my use case and appears to be a very elegant solution to an otherwise hairy problem as pointed out above.

apollo13 commented 3 years ago

@the-maldridge That is great to hear. I have nothing specific in mind for testing; mainly wanted a second pair of eyes on it.

The error message is weird, but that part of the code is something I do not understand well enough yet. I'll wait for feedback from the nomad team before diving into this further.

apollo13 commented 3 years ago

Actually the error message was just a typo, should be fixed now.

the-maldridge commented 3 years ago

Any updates here from the fine folks at Hashicorp? I'm debating running this in prod as this is a feature that very neatly resolves a problem I need to work around.

Xopherus commented 3 years ago

@the-maldridge the way my team worked around this was using the concept of Nomad datacenters to separate workloads. We use AWS to deploy Nomad, so what we did was deploy a 2nd AutoScalingGroup and add this file to override the datacenter.

cat /etc/nomad.d/datacenter.json
{
  "datacenter": "${datacenter}"
}

Then in your jobspecs you can target which datacenter you want the job to run in. It's a pretty good workaround if you don't plan on adding too many namespaces.

Oh also a caveat - Nomad datacenters are NOT the same as Consul datacenters. So you can have multiple Nomad datacenters all use the same Consul datacenter. This is important if your applications are using Consul for service discovery - service names should not have to change even if you move apps to a new Nomad datacenter.

the-maldridge commented 3 years ago

@Xopherus I am unfortunately well familiar with the spam-datacenters approach. It becomes unmanageable though after adding a few datacenters, and requires extra thought from teams consuming nomad to remember what datacenters they're allowed to run in or have machines that service them.

I'd call it a hack at best, but not something I'd want to subject a team to long term, and certainly not something I'd want to manage long term with an even moderately sized use case of Nomad. The singular exception of DC sprawl that I could see using in that case is for system tasks which don't have any other means of constraint.

Xopherus commented 3 years ago

Understood, figured I'd reply anyway in case others are like me and maybe only need a handful of namespaces. The datacenter workaround is relatively clean for that use case. Though TBH the difference between remembering what datacenter or namespace to choose seems about the same level of effort regardless of scale imho.

But I definitely hear you about maintaining those datacenters at scale. Ideally I'd love to be able to have a high level overview of the namespaces and what their resource usage is (in terms of raw resources and/or number of clients). I can imagine a UI/API where you could spin up new clients into a "default" namespace and then re-assign them to other namespaces at will. That would make it so much easier on the deployment side of things because redeploying an entire nomad cluster could take hours if not days.

apollo13 commented 3 years ago

@schmichael Any chance of getting that roadmapped and maybe even into 1.2?

schmichael commented 3 years ago

It's not going into 1.2 (which is :soon:), but we're definitely discussing something of this shape. No timeline yet I'm afraid.

apollo13 commented 2 years ago

@schmichael Some time has passed; any chance for a timeline or so? :)

dsmoljanovic commented 2 years ago

Voting for this also. Would help create a more secure environment for sure. I'm not seeing any workaround for this.

apollo13 commented 1 year ago

@schmichael Did the internal discussions get somewhere? If you could lay out a plan the community might be able to provide code :)

schmichael commented 1 year ago

@schmichael Did the internal discussions get somewhere? If you could lay out a plan the community might be able to provide code :)

Sorry for the lack of updates all and thanks to @apollo13 for offering to help!

Since 1.5 recently shipped we have been finalizing the 1.6 roadmap, and node namespacing is on it. "Node Pools" is the name for the feature right now, but it's still early in the design phase. We intend to post more roadmap and design information in the coming weeks!

apollo13 commented 1 year ago

Thanks for the update. Any chance that https://github.com/hashicorp/nomad/issues/6554#issuecomment-1050906269 is on the roadmap as well 😂

On Mon, Mar 13, 2023, at 17:24, Michael Schurter wrote:

@schmichael https://github.com/schmichael Did the internal discussions get somewhere? If you could lay out a plan the community might be able to provide code :)

Sorry for the lack of updates all and thanks to @apollo13 https://github.com/apollo13 for offering to help!

Since 1.5 recently shipped we have been finalizing the 1.6 roadmap, and node namespacing is on it. "Node Pools" is the name for the feature right now, but it's still early in the design phase. We intend to post more roadmap and design information in the coming weeks!

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/nomad/issues/9342#issuecomment-1466470659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAT5CZ2GP7T6MZE6TOTSLTW35C4NANCNFSM4TTYZKTQ. You are receiving this because you were mentioned.Message ID: @.***>

mikenomitch commented 1 year ago

Thanks for the update. Any chance that #6554 (comment) is on the roadmap as well 😂

Not yet @apollo13, but noted on the request!

mikenomitch commented 1 year ago

Hey everybody, the engineering/product team is currently working on this and we've got some debates about the best way to design this feature.

If this is something you've been waiting for, it would help if you gave us some information about what you're trying to achieve.

Would this be used to split up different teams, and/or different environments, or to separate a select few highly privileged/special client nodes? - Do you have a more complex/hierarchical setup in mind?

Also, If anybody feels like chatting about this with some of the team, feel free to grab a time and we can talk through your use case.

cipriancraciun commented 1 year ago

Would this be used to split up different teams, and/or different environments, or to separate a select few highly privileged/special client nodes? - Do you have a more complex/hierarchical setup in mind?

In my case, which I bet it's a corner-case and far from mainstream setup, I want to be able to run multiple Nomad clients on the same physical machine, each under different OS users, and then allow certain Nomad users access only to certain clients that run as certain OS users. Think of this use-case as a SSH replacement, with queuing and background execution, and with all the other Nomad features.

For more concrete details, I envisage running at least three kinds of Nomad clients on each physical host:

one running with root user, thus being able to do almost anything;
one running with admin user, which normally doesn't run directly as root, but which could run some more sensible commands;
one running with user user, which could be accessed by normal users, but with more restricted permissions; (one could think that user is actually team-a, then team-b, etc.)

All these nodes and users should have access to the raw_exec or other drivers.

Based on the namespaces feature, I would have one namespace matching each of the previously mentioned roles, i.e. root, admin, user (or team-a, team-b), and tie each Nomad client (that runs as a particular OS user) to the matching namespace.

Currently, because I can't restrict which namespace to be handled by a particular client, I would need to have one Nomad cluster per OS user.

apollo13 commented 1 year ago

Would this be used to split up different teams, and/or different environments, or to separate a select few highly privileged/special client nodes? - Do you have a more complex/hierarchical setup in mind?

All of those and no, not really anything more complex than that :) But basically I want to give certain teams access to certain namespaces (doable via ACL) and then have namespaces like dev & prod tied to their individual nodes. The idea here being that a task that runs on dev should have any side-effects on prod. I understand that the preferred way would probably be to use one cluster per environment, but in small scale scenarios this is kind of overkill.

Another example would be some special nodes which would only run ingress services (like traefik) and shouldn't run any normal workloads as those nodes are basically in kind of a "dmz".

acornies commented 1 year ago

I don't know if this is helpful to anyone but I've written a related blog post that touches on this topic. It does require Sentinel however (enterprise feature) but I've tried to make it clear on how it's used in the wild.

apollo13 commented 1 year ago

Hi folks, after some discussion with @schmichael on the fediverse about the node pool stuff in 1.6 Beta (https://hachyderm.io/@schmichael/110628556291664149) I will add some thoughts here.

First and foremost I'd like to say that I fully understand that Hashicorp needs to guard some features behind the enterprise paywall to keep a viable business. Even more so as we see other companies like RedHat apparently struggling with their business model. I also love you folks for not adding SSO behind an enterprise wall which is something sadly many products do. On the other hand "Audit Logging" is a feature which certainly belongs to enterprise -- while everyone would like to have it (even if they don't need it), stronger requirements for audit logging usually come from regulated areas (gov, financial etc) where there is a different kind of money to be made (or so I heard).

So what about node pools? Node pools are more or less a "fix" for the issue at hand, but most of it's features are behind enterprise. The one that hurts me most is the ACL part of it (namely which node pools are usable by which namespaces). How so? Even if you are using nomad only internally and trust everyone accessing it you want to limit fallout in case of token leaks etc. Even without regulatory requirements, it is simply good security hygiene and with all the leaks and hacks out there I think it is fair to argue that even small companies should strive for locked down systems as far as possible.

What also somewhat irks me about it is that as @acornies (if I understood them correctly) has shown that something like node pools ACLs are already possible with sentinel -- so the value-add here seems to be convenience for the most part (I understand that the scheduler stuff is probably not possible with Sentinel).

So assuming that tying jobs to certain nodes was already possible with Sentinel before, then the ACL part of the node_pools is maybe not something that would prompt a user to buy enterprise now.

I don't want to bring out the k8s sledgehammer, but I think ACLs is something companies would want rather early on at a point where they maybe cannot justify paying for nomad yet. When that plays into the decision to initially focus on k8s, it might get harder to get them back.

So where does that leave me: I'd like to ask the fine folks at HC to reconsider the current decision and ask for node pool ACLs to be added to core instead (to clarify: I am talking about allowed = ["default", "autoscaling-nodes"] and maybe the default attribute, not scheduler_config which should imo stay enterprise). I realize this is a big ask, but I am asking it nevertheless. Even if the answer ends up being no, it would be great if you could say so and then we can close this ticket.

CC'ing in @mikenomitch for visibility

mikenomitch commented 1 year ago

Hey @apollo13, first of all I want to say thanks for the thoughtful comment and positive tone even if the decision is disappointing.

Unfortunately, at least in the short-to-medium term, we don’t plan to open source the namespace-node pool connection. As you alluded to in your comment, we do need to hold back certain features to make sure that the Enterprise product is compelling enough to buy. It isn’t an exact science, but features that you only hit at a certain scale or level of organizational complexity are generally the ones we make Enterprise-only. Node pool governance falls into that category, as far as we can tell.

There’s always a short-term tradeoff between making our OSS users happy and growing that user base, and sustaining the business. With namespaces having gone open source in 1.0, SSO being open source, and most of our efforts over the last couple years being OSS-focused, we felt like we needed to bolster the Enterprise offering a little bit.

Additionally, the fact that Sentinel can be/is used to gate node access was one of the contributing factors in keeping this Enterprise only. We didn’t want to undercut a use case for Sentinel and have users who bought Sentinel for this purpose question their Nomad purchase decision.

We ended up making this decision a bit later than we should have, and because of that I didn’t do a great job communicating this before the 1.6 beta release. I’m sorry about that. That is on me. I’ll try to make sure that any future Enterprise-only issues are marked as such as soon as we know.

I’ll close this feature out but add an additional comment at the end asking OSS users if this might impact their decisions to adopt Nomad. I don’t want to get anybody’s hopes up, but we do have precedent for open-sourcing an enterprise-only features when it was impacting OSS adoption (namespaces). If anybody is in this camp, feel free to add a comment to this issue with your use case.

mikenomitch commented 1 year ago

Going to close this out as it is shipped in Nomad 1.6 Enterprise.

As I mentioned above in my response to @apollo13, if this feature being Enterprise-only is affecting anybody's adoption of Nomad OSS, please do let us know in a comment. It is good feedback for us to get.

the-maldridge commented 1 year ago

@mikenomitch I totally get the need to shore up enterprise, but this is a misstep as far as getting people to onboard to Nomad. Folks at $work took one look at the release notes after I mentioned I was excited for this feature, saw a basic security feature behind a paywall and are now even more hype than previously for k8s where the moon is served up on a silver platter. At some point I'm forced to agree that having a complete, robust security system out of the gate without a paywall is enough of a feature for me to stop trying to advocate for what I believe to be the better operational solution (Nomad obviously).

I fundamentally disagree with the idea that this undermines the enterprise value add for sentinel. Most people I work with have commented that Nomad's security model is shockingly primitive, and I'm inclined to agree. Of course I remember deploying a production nomad cluster back in the times of high adventure on 0.7 series and having to remind everyone to think before they typed since the lack of even basic namespaces then meant any developer could globally stop all the load balancers! In my organization we use a number of other Hashicorp products, and we view sentinel as a very different component of the stack than basic auth components such as Nomad's namespaces (which internally we explain to people as similar in concept to Consul and Vault's path elements). Sentinel is viewed as a manifestation of the often times byzantine compliance and regulatory policy as applied to technical systems. Specifically because of the incredibly complexity involved with Sentinel, we have chosen not to deploy it at all. My group cares about the far more simplistic use case of ensuring that the foot-guns are stored safely where relatively few people have access to them. When protecting prod is considered a vendor value-add, that's the point at which we start looking for new tooling.

The value of Nomad over similar solutions in my opinion has always been that its one of the few cluster systems out there, certainly the only general purpose one, that you can have exactly one of. Between the ability to scale, put additional servers across failure domains, and a usable federation system Nomad ticks almost all the boxes I need ticked to roll it out at the global scale my organization operates at. I am, however, reminded often and jovially by my peers that Nomad doesn't really come with the security features required to run more than my team's immediate workflows on it which I'm forced to agree with. The constant replies that "Nomad is built on a trusted operator model" seem to constantly ignore the feedback of the community that trust comes in various levels, and operators are not all uniformly knowledgeable nor skilled.

As to getting FOSS customers to become paying customers, which seems to be the real issue at heart here, I reiterate what I've said for some time now: the companies I work for are not interested in more features, we're interested in the ability to escalate to an engineer for the software that we have the source for when it breaks. We are able to do this with our operating system, our primary application runtimes, and our orchestration tooling, but the inability to do this with any of our Hashicorp tooling makes it harder for me to champion its use given it is the one odd-duck in our environment. We run Open Source tools because we are an engineering company and we have engineers who can and do diagnose problems all the way down to the source to ensure its not something strange we've (I've) done.

As always feel free to follow up with me privately if you'd like a longer non-public discussion.

TL;DR - The choice to put basic security features behind paywalls when the effort to start paying doesn't justify those features hampers adoption of the product. The appearance of paywalling security features at all, not the fancier enterprise compliance features, is not a great look, and is why sites like https://sso.tax/ exist.

hashicorp / nomad

Provide a way to tie namespaces to certain client nodes #9342