Closed the-maldridge closed 1 year ago
Hi @the-maldridge! This seems like a reasonable idea, and I've marked it for roadmapping.
The CNI configuration we use can be found at networking_bridge_linux.go#L141-L180
. Note that also configures the firewall
and portmap
plugins.
One approach we could take here is to allow the administrator to override that template with a config file somewhere on the host. The configuration seems fairly straightforward, but then it's a matter of hunting down anywhere in the client code that has specific assumptions about that template and figuring out how to detect what's the right behavior from there.
@tgross I like the template idea, it would provide the most flexibility while removing a dependency on a hard-coded string literal, something I always like to do. What do you think about using go:embed to include the default template rather than using the string literal as a means of simplifying the code that loads the various options. I can't remember off the top of my head though what version of Go that was introduced in to know if nomad already targets that version.
Yup, main
targets go1.18, so we're fine on using embed
and we're gradually moving some of our other embedded blobs over to that as well (the big lift still being the UI bundle).
@tgross I think some sort of 'escape hatch' similar to those being used for envoy maybe an option here. If we could pass on some additional 'json' to some partas of the nomad's bridge conflist file, like adding additional plugins to the list, etc. that would make it easier to extend nomad's bridge CNI setup.
In my case that would allow for using cilium along with nomad's own bridge. And being able to mix and match Consul Connect enabled services with other policied by cilium. Or even have direct l3 reachability from tasks on different nomad nodes, tunneled by cilium under nomad's bridge.
Another option that came to my mind could be using something like https://github.com/qntfy/kazaam in order to allow the user to specify some 'json transformations' to apply to normad's bridge CNI config in runtime.
This would work like:
While this might not be the most straightforward mean to 'edit' the CNI template, this is probably the most flexible option, and can open a lot of possibilities for sysadmins to integrate nomad's bridge with many different networking systems.
Dunno what do you think @tgross.. if this seems 'acceptable' from hashicorp's point of view I could try to hack something.
Regards Pablo
@pruiz my major worry with that specific approach is that it introduces a new DSL into the Nomad job spec. Combine that with HCL2 interpolation and Levant/nomad-pack interpolation and that could get really messy. If we were going to allow job operator configuration of the bridge at all, I'm pretty sure we'd want it to be a HCL that generates the resulting JSON CNI config (which isn't all that complex of an object, in any CNI config I've seen at least).
That also introduces a separation of duties concern. Right now the cluster administrator owns the bridge configuration to the limited degree we allow that; expanding that configuration is what's been proposed as the top-level issue here. Extending some of that ownership to the job operator blurs that line.
Can you describe in a bit more detail (ideally with examples) what kind of configurations you couldn't do with the original proposal here (along with the cni
mode for the network
block)? That might help us get to a workable solution here.
Hi @tgross,
I probably miss explained my self a bit. I was not proposing to add the new 'bridge_transform_rules' parameter to nomad's job spec. Just adding it to nomad client/host config..
IMHO, being able to fine-tune bridge's CNI config from job spec would be good, but it opens a lot more issues hard to solve, as bridge instance (and veth's attached to it) should be consistent among jobs for things like Consul Connect to work.
However, being able to customize bridge's CNI settings at host-level (ie. from /etc/nomad.d/nomad.hcl) opens (I think) a lot of potential. And keeping it (right now) restricted to cluster admins, makes sense (at least to me), as cluster admin is the one with actual knowledge of the networking & environment where the node lives on.
As per the new-DSL issue, I understand your point about adding another sub-DSL to config, but I just dont see how we can apply 'unlimited' modifications to a json document using HCL.
Adding some 'variables' to interpolate to the JSON emitted by networking_bridge_linux.go and replace them with new values at /etc/nomad.d/nomad.hcl, seems something workable, but as it happens with other similar approaches, the user N+1 is going to find he needs a new interpolable variable somewhere within the JSON which is not yet provided.. That's why I was looking into something more unrestricted.
In my use case, for example, my idea would be to mix Consul Connect & Cilium on top of nomad's bridge.
In order to do so, my nomad's host config (/etc/nomad.d/nomad.hcl) would include something like:
With this configuration applied on cluster nodes, I could be able to launch jobs using the native bridge (instead of cni/*) which will be able to make mixed use of Consul Connect and Cilium, enabling:
All at the same time and from within the same Task Group.
Regards Pablo
[1] Currently jobs using Cilium (by means of a network=cni/*) cannot use Consul Connect (and vice-versa)..
That's a really complete and much better phrased explanation and feature matrix than I was typing up @pruiz, it sounds like we have almost identical use cases here. I also think this is something that realistically only a cluster root operator should change, since this is going to involve potentially installing additional packages at the host level to make it work.
As to the HCL/JSON issue, what about writing the transforms in HCL and then converting that to the relevant JSON as is already done for jobspecs? It adds implementation complexity for sure, but it also keeps the operator experience uniform, which it sounds like is a primary goal here.
Ok, I'm glad we're all on the same page then that this belongs to the cluster administrator.
So if I tried to boil down the "transformations" proposal a bit, the primary advantage here over simply pointing to a CNI config file is wanting to avoid handling unique-per-host CNI configuration files so that you can do things like IP prefixes per host (as opposed to having host configuration management do it). That seems reasonable given we already have Nomad creating the bridge. You'd still need a source for the per-host configuration though. Suppose we had a 90/10 solution here by supporting a cni_bridge_config_template
(happy to workshop that name) that also supports interpolation, where would we put the values we're interpolating without having per-host configuration anyways? Take it from the environment somehow?
Hi @tgross I think the cni_bridge_config_template seems like a good middle point, yes, cause:
And I think this is something everybody can cope with.
As for the actual template file to pass to cni_bridge_config_template, I think that could be a plain text file onto which nomad can perform such variable interpolations. Or a consul-template file which nomad can render (passing the variables to consul-template's engine), as nomad already uses consul-template for other similiar stuff. Dunno what do you guys think on this?
Last, with regard to interpolation variables, I think nomad could pass at a minimun the same values it is already using when generating bridge's json:
And we could consider exposing as interpolation also (but not sure):
Regards
Hi everyone 👋
After further discussion we feel like adding more customization to the default bridge
may result in unexpected outcomes that are hard for us to debug. The bridge
network mode should be predictable and easily reproducible by the team so we can rely on common standard configuration.
Users that require more advanced customization are able to create their own bridge network using CNI. The main downside of this is that Consul Service Mesh currently requires network_mode = "bridge"
, but this is a separate problem that that is being tracked in #8953.
Feel free to 👍 and add more comments there.
Thank you everyone for the ideas and feedback!
Hmm, that's a frustrating resolution as it means that to use consul connect in conjunction with CNI I'd now need to edit every network block in every service template in every cluster, whether or not those tasks used a CNI network previously. At that point it seems like the better option to me is to abandon consul connect entirely and use a 3rd party CNI to achieve a similar result.
I'm following the other ticket, but it really doesn't look like any consideration is given there to the default path that nomad comes with out of the box. Any thoughts on how to continue to have working defaults and still enjoy both CNI and Consul Connect?
@lgfa29 While consul connect is a good solution for common use cases, it is clearly lacking when trying to use it to deploy applications requiring more complex network setups (for example applications requiring direct [non-nat, non-proxied] connections from clients, or clusters requiring flexible connection between nodes on dynamically allocated ports, solutions requiring maxing out the network I/O performance of the host, etc.).
For such situations the only option available is to use CNI, but even this is somewhat limited on nomad (ie. CNI has to be setup per host, networking has to be defined on a job-basis and CNI stuff has to be already present and pre-deployed/running on nomad-server before deploying the job, one can not mix connect with custom-CNIs, etc.). And, at the same time, there is no solution for having more than "one networking" (ie. CNI plus bridge) for a single Task, nor there is a clear solution for mixing jobs using Consul Connect and jobs using CNI.
This is clearly an issue for nomad users, as this limits Consul Connect to simple use cases, forcing us to deploy anything not (let's say) Consul-Connect-Commpatible outside of nomad, on top of a different solution (for deployment, traffic policying, etc.) and relay on outbound gateway for providing access from nomad's jobs to such 'outside' elements.
I understand hashicorp needs a product that can be supported with some clear use case and limits. But at the same time we as community need some extensibility for use cases not needing the be covered by commercial hashicorp support options. That's why the idea of this being a setting for extending the standard nomad feature made sense to me. HashiCorp could simply label this as 'community supported-only' or something like that and focus on enhancing consul connect, but at the same time let the community work around until something better arrives.
As stated I was willing to provide a PR for this new feature, but right now, I feel a bit stranded, as I don't really understand why not supporting a use case which on nomad code-base only implies being able to extend the CNI config, and which can be declared 'community supported' if that's a problem for hashicorp's business. I just hope you guys can reconsider this issue.
Regards Pablo
@lgfa29 While consul connect is a good solution for common use cases, it is clearly lacking when trying to use it to deploy applications requiring more complex network setups (for example applications requiring direct [non-nat, non-proxied] connections from clients, or clusters requiring flexible connection between nodes on dynamically allocated ports, solutions requiring maxing out the network I/O performance of the host, etc.).
For such situations the only option available is to use CNI, but even this is somewhat limited on nomad (ie. CNI has to be setup per host, networking has to be defined on a job-basis and CNI stuff has to be already present and pre-deployed/running on nomad-server before deploying the job, one can not mix connect with custom-CNIs, etc.). And, at the same time, there is no solution for having more than "one networking" (ie. CNI plus bridge) for a single Task, nor there is a clear solution for mixing jobs using Consul Connect and jobs using CNI.
This is clearly an issue for nomad users, as this limits Consul Connect to simple use cases, forcing us to deploy anything not (let's say) Consul-Connect-Commpatible outside of nomad, on top of a different solution (for deployment, traffic policying, etc.) and relay on outbound gateway for providing access from nomad's jobs to such 'outside' elements.
I understand hashicorp needs a product that can be supported with some clear use case and limits. But at the same time we as community need some extensibility for use cases not needing the be covered by commercial hashicorp support options. That's why the idea of this being a setting for extending the standard nomad feature made sense to me. HashiCorp could simply label this as 'community supported-only' or something like that and focus on enhancing consul connect, but at the same time let the community work around until something better arrives.
As stated I was willing to provide a PR for this new feature, but right now, I feel a bit stranded, as I don't really understand why not supporting a use case which on nomad code-base only implies being able to extend the CNI config, and which can be declared 'community supported' if that's a problem for hashicorp's business. I just hope you guys can reconsider this issue.
Regards
Pablo
I, too, support @pruiz use-case. I had to abandon Hashistack altogether because of Nomad's opinions on CNI. Consul Connect is a good generic solution, but it leaves much to be desired in the flexibility department. I tried to plumb in Cilium using their (deprecated) Consul integration and after a few months I had to bag it. It doesn't seem impossible, but it's beyond my current capabilities. So, yes. What Pablo is proposing doesn't seem unreasonable and I ask HCI to reconsider.
Hi everyone 👋
Thanks for the feedback. I think I either didn't do a good job explaining myself or completely misunderstood the proposal. I will go over the details and check with the rest of the team again to make sure I have things right.
Apologies for the confusion.
Hi everyone :wave:
After a more thorough look into this I want to share what I have observed so far and expand on the direction we're planning to take for Nomad's networking story.
The main question I'm trying to answer is:
Does this proposal provide new functionality or is it a way to workaround shortcomings of the Nomad CNI implementation?
From my investigation so far I have not been able to find examples where a custom CNI configuration would not be able to accomplish the same results as a the proposed cni_bridge_config_template
. That being said, I have probably missed several scenarios so I am very curious to hear more examples and use cases that I may have missed.
My first test attempted to validate the following:
Can I can create a custom bridge network based on Nomad's default bridge?
For this I copied Nomad's bridge configuration from the docs and changed the IP range.
I then used the following job to test each network.
I was able to access the allocations from the host via the port mapping, as expected from the default bridge
network.
So it seems to be possible to have a custom bridge network based off Nomad's default that behaves the same way, with the exception of some items that I will address below.
Next I wanted to test something different:
Can I create networks with other CNI plugins based on the Nomad bridge?
For the first test I used the macvlan
plugin since it's a simple one.
I wasn't able to get cross-network and host port mapping communication working, but allocations in the same network were able to communicate. I think this is where my lack of more advanced networking configuration is a problem and I wonder if I'm just missing a route configuration somewhere.
Next I tried a Cilium network setup since @pruiz and @brotherdust mentioned it. It is indeed quite challenging to get it working, but I think I was able to get enough running for what I needed. First I tried to run as an external configuration using the generic Veth Chaining approach because I think this is what is being suggested here, the ability to chain additional plugins to Nomad's bridge.
Although far from a production deployment, I think this does show that it's possible to setup custom CNI networks without modifying Nomad's default bridge.
Except for the points I mentioned earlier, so I will try to list them all here and open follow-up issues for us to address them.
nomad
as comment. This is not true for custom CNI networks so they may leak.SIGHUP
, so they require the agent to restart. CNI plugins are sometimes deployed as fully bundled artifacts, like Helm charts, that are able to apply CNI configs to a live cluster.bridge
network mode at job validation.bridge
).These are all limitations of our current CNI implementation that we need to address, and are planning to do so. The last item is more complicated since it requires more partnership and engagement with third-party providers, but we will also be looking into how to improve that.
What's left to analyze is main the question:
Does this proposal provide new functionality or is it a way to workaround shortcomings of the Nomad CNI implementation?
For this I applied the same Cilium configuration directly to the code that generates the Nomad bridge. If I understood the proposal correctly chaining CNI plugins to the Nomad bridge would be the main use case for this feature, but please correct me if I'm wrong.
But things were not much better, and most of the items above were still an issue.
And so, looking at the list of issues above, the proposal here would only incidentally fix the first two items because of the way things are named and currently implemented, and both items are things we need to fix for CNI anyway.
Now, to address some of the comments since the issue was close.
From @the-maldridge.
it means that to use consul connect in conjunction with CNI I'd now need to edit every network block in every service template in every cluster, whether or not those tasks used a CNI network previously.
Having to update jobspecs is indeed an unfortunate consequence, but this is often true for new features in general and, hopefully, it's a one-time process. Modifying Nomad's bridge
would also like require all allocations to be recreated so a migration of workload is also expected in both scenarios. The upgrade path also seems risky? How would you go from the default bridge
to a customized bridge
?
At that point it seems like the better option to me is to abandon consul connect entirely and use a 3rd party CNI to achieve a similar result.
Nomad networking features and improvements have been lagging and we're planning to address them. CNI, Consul Connect, IPv6 (which was the original use case you mentioned) are all things we are looking into improving, but unfortunately I don't have any dates to provide at this point to help you make a decision on which tool to use.
I'm following the other ticket, but it really doesn't look like any consideration is given there to the default path that nomad comes with out of the box.
You are right, the issue I linked was about enabling Consul Connect on CNI networks. https://github.com/hashicorp/nomad/issues/14101 and https://github.com/hashicorp/nomad/issues/7905 are about IPv6 support in Consul Connect and Nomad's bridge.
Any thoughts on how to continue to have working defaults and still enjoy both CNI and Consul Connect?
Right now the only way I can think of to solve your issue is to run a patched version of Nomad to customize the hardcoded bridge config. But even that I'm not sure if it will be enough to fully enable Connect with IPv6.
From @pruiz.
While consul connect is a good solution for common use cases, it is clearly lacking when trying to use it to deploy applications requiring more complex network setups
Agreed. We (the Nomad team) need to find a way to address this and better integrate with other networking solutions. We don't have any specifics at this point, but community support is always a good start and much appreciated!
For such situations the only option available is to use CNI, but even this is somewhat limited on nomad
:100: we need to improve our CNI integration.
CNI has to be setup per host, networking has to be defined on a job-basis and CNI stuff has to be already present and pre-deployed/running on nomad-server before deploying the job
That's correct, but so would be the proposal here if I understood it correctly?
one can not mix connect with custom-CNIs
Right, and the plan is to address this in https://github.com/hashicorp/nomad/issues/8953. It may be that removing the validation is enough. Having more people test the custom binary I provided there would be very helpful.
And, at the same time, there is no solution for having more than "one networking" (ie. CNI plus bridge) for a single Task
That's also true, but also not covered by this proposal? As far as I know, Kubernetes also suffers from the same issue and there are meta-plugins to multiplex different networks, like Multus. I have this in my list above to be created as a follow-up issue.
nor there is a clear solution for mixing jobs using Consul Connect and jobs using CNI.
Yup, that's covered in https://github.com/hashicorp/nomad/issues/8953. One thing to clarify is what do you mean by "mixing jobs". Do you envision an alloc that uses Consul Connect to be able to reach an alloc on Cilium for example? If that's the case I'm not sure if it would work without a gateway :thinking:
This is clearly an issue for nomad users, as this limits Consul Connect to simple use cases, forcing us to deploy anything not (let's say) Consul-Connect-Commpatible outside of nomad, on top of a different solution (for deployment, traffic policying, etc.) and relay on outbound gateway for providing access from nomad's jobs to such 'outside' elements.
I'm sorry, I didn't quite follow this part. Are you talking about, for example, having to deploy the Cilium infrastructure to use something beyond Connect?
I understand hashicorp needs a product that can be supported with some clear use case and limits. But at the same time we as community need some extensibility for use cases not needing the be covered by commercial hashicorp support options. That's why the idea of this being a setting for extending the standard nomad feature made sense to me. HashiCorp could simply label this as 'community supported-only' or something like that and focus on enhancing consul connect, but at the same time let the community work around until something better arrives.
This is the void we expect CNI to fill by allowing users to create their own custom networks that fits their specific needs. This specific item is not about commercial support but feature support in general. We try to be careful about backwards compatibility and this would introduce a feature we expect to deprecate. I understand the frustration but, historically, we treat code shipped as code being used. For experimentations a temporary fork may be the best approach.
As stated I was willing to provide a PR for this new feature, but right now, I feel a bit stranded, as I don't really understand why not supporting a use case which on nomad code-base only implies being able to extend the CNI config, and which can be declared 'community supported' if that's a problem for hashicorp's business.
This is not a business decision, and I apologize if I made it sound like one. This was a technical decision as we found that arbitrary modifications to the default bridge
network could be dangerous as it can break things in very subtle ways and the Nomad bridge
has a predictable behaviour that we often rely on to debug issues.
We are always happy to receive contributions, and I hope this doesn't discourage you from future contributions (we have lots to do!). But sometimes we need to close feature requests to make sure we are moving towards a direction we feel confident in maintaining.
I just hope you guys can reconsider this issue.
Always! As I mentioned, the main point that I may be missing is understanding what you would be able to do with this feature that would not be possible with a well functioning CNI integration. Could you provide an example of what you would like to add to Nomad's bridge config? That can help us understand the use case better and yes, we are always willing to reconsider.
From @brotherdust.
I had to abandon Hashistack altogether because of Nomad's opinions on CNI. Consul Connect is a good generic solution, but it leaves much to be desired in the flexibility department. I tried to plumb in Cilium using their (deprecated) Consul integration and after a few months I had to bag it.
That's unfortunate but definitely understandable given where we are right now. Anything specific you could share to help us improve?
To finish this (already very) long comment I want to make sure that it is clear that closing this issue it's just an indication that we find a stronger and better CNI integration to be a better approach for customized networking. What "stronger and better" means depends a lot from your input, so I appreciate all the discussion and feedback so far, please keep them coming :slightly_smiling_face:
@lgfa29 , thank you for your thoughtful and detailed response. I'm sure it took some time out of your regular activities and I can appreciate it!
I agree with you 100% that Nomad needs better CNI integration and much better IPv6 support.
That's unfortunate but definitely understandable given where we are right now. Anything specific you could share to help us improve?
I need some time to gather my thoughts into something more cogent. I'll get back to you soon.
Wow, kudos for such an in-depth survey of the available options. I'm truly impressed that you got Cilium working and were able to use it even in a demo environment.
I think perhaps the deeper issue that I encounter with this while looking at it is that there is a constant upgrade treadmill to operate an effective cluster. A treadmill that often times involves tracking down users in remote teams, who do not have dedicated operations resources but still expect the things they want to do in the hosted cluster environment to work. The kubernetes world solved this long ago with mutating ingress controllers to be able to monkey-patch jobspecs on the way in, and while I recognize the good arguments the Nomad team has made in the past against user-hosted ingress controllers, I can't deny that that converts operations teams into the very same mutating controller resources.
As to having to update jobspecs to make use of the new features, I remember the 0.12 upgrade cycle far too well when I spent about a week trying to figure out why none of my network config worked as I understood it to at the time. I'm really starting to wonder if the answer here is to just not use any of the builtin networking at all, to always stand up a CNI network that I own, and then put everything there. That seems to be the supported mechanism for managing a stable experience for downstream Nomad consumers, would you agree?
Edit: added mention of Fermyon-authored Cilium integration with Nomad.
@lgfa29 , thank you for your thoughtful and detailed response. I'm sure it took some time out of your regular activities and I can appreciate it!
I agree with you 100% that Nomad needs better CNI integration and much better IPv6 support.
That's unfortunate but definitely understandable given where we are right now. Anything specific you could share to help us improve?
I need some time to gather my thoughts into something more cogent. I'll get back to you soon.
Ok. Thoughts gathered! First, I want to qualify what I'm describing with the fact that I am, first and foremost, a network engineer. This isn't to say that I have expert opinions in this context, but to indicate that I might have a different set of tools in my bag than a software engineer or developer; therefore, there's a danger that I'm approaching this problem from the wrong perspective and I'm more than willing to hear advice on how to think about this differently.
The goals enumerated below are enumerated for a reason: we'll be using them for reference later on.
nomad.job.type = service
I set off finding the pieces that would fit. It eventually came down to k8s and Hashistack. I selected Hashistack because it's basically the opposite of k8s. I'll skip my usual extended diatribe about k8s and just say the k8s is very... opinionated... and is the ideal solution for boiling the ocean, should one so desire.
In a general sense, the most difficult parts of the evaluation comes down to one thing: where Hashistack doesn't cover the use-case, a third-party component must be integrated. Or, if it does cover the use-case, the docs are confusing or incomplete.
To the detriment of all, all the cool kids build service-mesh CNIs for k8s. They use k8s APIs, CRDs and such; things that Nomad (and Consul, indirectly) do not understand; and, frankly, shouldn't. Nomad has CNI support, but it's very basic in the sense that it cannot be programmatically or natively configured via Nomad jobspec. It seems there is some template functionality I wasn't aware of, as indicated by some of the content of this thread, so I'll have to revisit that.
I very much agree with @lgfa29 that probably the best outcome is just to integrate Cilium as part of Nomad. That creates its own burden on the Hashicorp, so I'm not sure if they're going to be willing to do that. In this instance, I am happy to volunteer some time to maintain the integration once it is completed.
Which brings me to a related note: I saw a HashiConf talk by Taylor Thomas from Fermyon. In it he describes a full-featured Cilium integration with Nomad they are planning on open sourcing. It hasn't happened yet due to time constraints, so I reached out to them to see what the timeline is and if they would like some help. Hopefully I or someone more qualified (which is pretty much anyone) can get the ball rolling on that. If anyone wants me to keep them up to date on this item, let me know.
I realize this seems somewhat off-subject, but it is somewhat related.
This article covers some of the issues I experience, which I'll quote from here:
What does it REALLY takes to operate a whole hashistack in order to support the tiny strawberry atop the cake, namely nomad?
First of all, vault, which manages the secrets. To run vault in a highly available fashion, you would either need to provide it with a distributed database (which is another layer of complexity), or use the so called: integrated storage, which, needless to say is based on raft1. Then, you have to prepare an self signed CA1 in order to establish the root of trust, not to mention the complexity of unsealing the cluster on every restart manually (without the help of cloud KMS).
The next is consul, that provides service discovery. Consul models the connectivity between nodes into two categories, lan and wan, and each lan is a consul datacenter. Consul datacenters federate over the wan to form a logical cluster. However, data is not replicated across datacenters, it's only stores in respective datacenters (with raft2) and requests destined for other datacenters are simply forwarded (requiring full connecitity across all consul servers). For the clustering part, a gossip protocol is used, formaing a lan gossip ring1 per datacenter, and a wan gossip ring2 per cluster. In order to encrypt connections between consul servers, we need a PSK1 for the gossip protocol, and another CA2 for rpc and http api. Although the PSK and the CA can be managed by vault, there is no integration provided, you have to template files out of the secrets, and manage all rotations by yourself. And, if you wanna use the consul connect feature (a.k.a. service mesh), another CA3 is required.
Finally, we've got to nomad. Luckily, nomad claims to HAVE consul integration, and can automatically bootstrap itself given a consul cluster is beneath it. You would expect (as I do) that nomad can rely on consul for interconnection and cluster membership, but the reality is a bloody NO. The so called integration provides nothing more than saving you typing a seed node for cluster bootstrap, and serves no purpose beyond that. Which means, you still have to run a gossip ring3 per nomad region (which is like a consul datacenter) and another gossip ring4 for cross region federation. And, nomad also stores its state in per region raft3 clusters. To secure nomad clusters, another PSK2 and CA4 is needed.
Let's recap what we have now, given that we run a single vault cluster and 2 nomad regions, each containing 2 consul datacenters: 2 PSKs, 4 CAs, 7 raft clusters, 8 gossip rings. And all the cluster states are scattered across dozens of services, making the backup and recovery process a pain in the ass.
So, besides experiencing exactly what the author mentioned, I can add: if you want to integrate any of these components with an existing enterprise CA, beware that, for example:
I think what happened is that the developers assumed that we'd want to use the self-signed CA that came with each component and nothing else. So, they weren't expecting a particular kind of error, or didn't see the need to comprehensively document what a certificate should look like. For lab purposes, this is acceptable. When one is trying to set up a production cluster, it's pretty rough.
On a final note, I seriously appreciate that this is open source software and that I am more than welcome to provide a PR. I even thought about justifying an enterprise license. But, in this particular case, a PR wouldn't be enough to address the architectural decisions that lead to where we are now; and, based on my experience with enterprise support contracts, would probably never be addressed unless there were some serious money on the table. I get it, I do. My expectations are low; but I thought it was at least worth the time to write all this out so that you would benefit from my experience.
Thanks again! Seriously great software!
Hi @lgfa29,
First, thnks for the thoughtful response, I'll try to answer some points I think relevant below.. ;)
Hi everyone 👋
After a more thorough look into this I want to share what I have observed so far and expand on the direction we're planning to take for Nomad's networking story.
The main question I'm trying to answer is:
Does this proposal provide new functionality or is it a way to workaround shortcomings of the Nomad CNI implementation?
From my investigation so far I have not been able to find examples where a custom CNI configuration would not be able to accomplish the same results as a the proposed
cni_bridge_config_template
. That being said, I have probably missed several scenarios so I am very curious to hear more examples and use cases that I may have missed.
I think the main deviation from your tested scenarios and the one I have in mind is that I want a single task (within a given allocation) should be able to use both Consul Connect and Cilium's networking. So the job would declare a single network stanza inherited by any tasks on it (which can be just a single one), and then that would work like:
This is the kind of integration between Consul Connect & Cilium I want to achieve.
[...]
From @pruiz.
[...]
one can not mix connect with custom-CNIs
Right, and the plan is to address this in #8953. It may be that removing the validation is enough. Having more people test the custom binary I provided there would be very helpful.
That would be an option for me, but given that we can use Connect on a custom CNI network, hopefully delegating to nomad's deployment/management of envoy proxy stuff.
And, at the same time, there is no solution for having more than "one networking" (ie. CNI plus bridge) for a single Task
That's also true, but also not covered by this proposal? As far as I know, Kubernetes also suffers from the same issue and there are meta-plugins to multiplex different networks, like Multus. I have this in my list above to be created as a follow-up issue.
Yeah, I know, kubernetes is similar here, but my point was that support for more than one networking, could be another way around for this.. just provide my tasks with one network 'connecting' to nomad's bridge, and another one connecting to cilium. :)
nor there is a clear solution for mixing jobs using Consul Connect and jobs using CNI.
Yup, that's covered in #8953. One thing to clarify is what do you mean by "mixing jobs". Do you envision an alloc that uses Consul Connect to be able to reach an alloc on Cilium for example? If that's the case I'm not sure if it would work without a gateway 🤔
This is what I explained at the top: I think we could make Connect and Cilium work ontop of the same bridge.. and have both working together side by side.
I understand hashicorp needs a product that can be supported with some clear use case and limits. But at the same time we as community need some extensibility for use cases not needing the be covered by commercial hashicorp support options. That's why the idea of this being a setting for extending the standard nomad feature made sense to me. HashiCorp could simply label this as 'community supported-only' or something like that and focus on enhancing consul connect, but at the same time let the community work around until something better arrives.
This is the void we expect CNI to fill by allowing users to create their own custom networks that fits their specific needs. This specific item is not about commercial support but feature support in general. We try to be careful about backwards compatibility and this would introduce a feature we expect to deprecate. I understand the frustration but, historically, we treat code shipped as code being used. For experimentations a temporary fork may be the best approach.
As stated I was willing to provide a PR for this new feature, but right now, I feel a bit stranded, as I don't really understand why not supporting a use case which on nomad code-base only implies being able to extend the CNI config, and which can be declared 'community supported' if that's a problem for hashicorp's business.
This is not a business decision, and I apologize if I made it sound like one. This was a technical decision as we found that arbitrary modifications to the default
bridge
network could be dangerous as it can break things in very subtle ways and the Nomadbridge
has a predictable behaviour that we often rely on to debug issues.
No bad feelings ;), I understood your point. Just wished we could find a iterim solution for the current limitations of Connect.
Regards Pablo
@the-maldridge
The kubernetes world solved this long ago with mutating ingress controllers to be able to monkey-patch jobspecs on the way in, and while I recognize the good arguments the Nomad team has made in the past against user-hosted ingress controllers, I can't deny that that converts operations teams into the very same mutating controller resources.
I've heard some people mentioning an approach like this before (for example, here is Seatgeek speaking at HashiConf 2022), but I'm not sure if there's been any final decision on this by the team.
I'm really starting to wonder if the answer here is to just not use any of the builtin networking at all, to always stand up a CNI network that I own, and then put everything there. That seems to be the supported mechanism for managing a stable experience for downstream Nomad consumers, would you agree?
That's the direction we're going. The built-in networks should be enough for most users and a custom CNI should be used by those that need more customization. The problem right now (in addition to the CNI issues mentioned previously) is that there's a big gap between the two. We need to figure out a way to make CNI adoption more seamless.
@brotherdust thanks for the detail report of your experience!
To the detriment of all, all the cool kids build service-mesh CNIs for k8s. They use k8s APIs, CRDs and such; things that Nomad (and Consul, indirectly) do not understand; and, frankly, shouldn't.
Yup, that's the part about partnerships I mentioned in my previous comment. But those can take some time to be established. The work that @pruiz has done in Cilium is huge for this!
Nomad has CNI support, but it's very basic in the sense that it cannot be programmatically or natively configured via Nomad jobspec. It seems there is some template functionality I wasn't aware of, as indicated by some of the content of this thread, so I'll have to revisit that.
Could you expand a little on this? What kind of dynamically values would you like to set and where?
I very much agree with @lgfa29 that probably the best outcome is just to integrate Cilium as part of Nomad.
Maybe I misspoke, but I don't expect any vendor specific code in Nomad at this point. The problem I mentioned is that, in theory, the CNI spec is orchestrator agnostic but in practice a lot of plugins have components that rely on Kubernetes APIs and, unfortunately, there is not much we can do about it.
I am happy to volunteer some time to maintain the integration once it is completed.
And that's another important avenue as well. These types of integration are usually better maintained by people that actually use them, which is not our case. Everything I know about Cilium at this point was what I learned from community in #12120 🙂
I think what happened is that the developers assumed that we'd want to use the self-signed CA that came with each component and nothing else. So, they weren't expecting a particular kind of error, or didn't see the need to comprehensively document what a certificate should look like. For lab purposes, this is acceptable. When one is trying to set up a production cluster, it's pretty rough.
I would suggest opening a separate issue for this (if one doesn't exist yet).
But, in this particular case, a PR wouldn't be enough to address the architectural decisions that lead to where we are now
You're right, this will be a big effort that will require multiple PRs, but my is to break it down into smaller issues (some of them listed in my previous comment already) so maybe there will be something smaller that you can contribute 🙂
Things like documentation, blog posts, demos etc. are also extremely valuable to contribute.
Thanks again! Seriously great software! ❤️
@pruiz
I think the main deviation from your tested scenarios and the one I have in mind is that I want a single task (within a given allocation) should be able to use both Consul Connect and Cilium's networking.
Yup, I got that. But I want to make sure we're on the same as to why I closed this issue. So imagine the feature requested here were implemented, which cni_bridge_config_template
would you write to accomplish what you're looking for? And what is preventing you from using a separate CNI network for this?
From what I gathered so far the only things preventing you from doing what you want are shortcomings in our CNI implementation. If that's not the case I would like to hear what cni_bridge_config_template
can do that a custom CNI would not be able to.
That would be an option for me, but given that we can use Connect on a custom CNI network, hopefully delegating to nomad's deployment/management of envoy proxy stuff.
Yes, the sidecar deployment is conditional on service.connect
not the network type.
I would appreciate if you could test the binary I have linked in https://github.com/hashicorp/nomad/issues/8953#issuecomment-1411344922 to see if it works for you.
Yeah, I know, kubernetes is similar here, but my point was that support for more than one networking, could be another way around for this.. just provide my tasks with one network 'connecting' to nomad's bridge, and another one connecting to cilium. :)
Yup, I have this on my list and I will open a new issue about multiple network interfaces per alloc 👍
Hi all 👋
I just wanted to note that, as mentioned previously, I've created follow-up issues on specific areas that must be improved. You can find them linked above. Feel free to 👍, add more comments there, or create new issues if I missed anything.
Thanks!
@lgfa29 , thanks much!
Proposal
Right now the configuration for the
nomad0
bridge device is hard coded. Among other things, this makes it impossible to use Consul Connect with nomad and IPv6.Use-cases
This would enable IPv6 with the bridge, it would also allow the use of more advanced or configurable CNI topologies.
Attempted Solutions
To the best of my knowledge, there is no current solution to make consul connect and nomad both play nice with IPv6, or other similarly advanced dual-stack network configurations.