Closed ccravens closed 7 months ago
xref #2475
Thanks for the detailed request @ccravens!
I'm not opposed in principle to adding Lua filters, I can see that it's a great way for advanced users to make sure that Contour can meet their use case quickly.
However, I've got a few worries about it:
Again, I totally understand that, sometimes, Lua is the quickest and easiest way to resolve problems. We use it inside Contour for a workaround at the moment.
But once we start accepting user-provided extensions like this, security and validation become orders of magnitude harder. I'd like to find ways to ensure that we're not giving ourselves a future supportability problem before going ahead, and at the very least be making a conscious decision about risk if and when we do.
I guess what I'm looking for here is some thoughts from you, @ccravens, or other interested parties on how we could address these concerns. Maybe I'm wrong and invalid Lua scripts don't share the same properties that other config does, and the invalid config concern is not valid here. And maybe there are other things we can learn from Istio and other Envoy-based projects to help with this?
I agree with @youngnick that users can get into a whole bunch of trouble with this, but at the same time we need to strike a balance with features. If support is an issue, one of the troubleshooting steps can be "disable your lua filters" and then test from there.
Maybe we could add some validation commands to Contour (or a kubectl plugin) to help out and implement the config load check tool (https://www.envoyproxy.io/docs/envoy/latest/start/install/tools/config_load_check_tool.html).
I think we could also make the lua bits an op-in for users so there's a flag that needs set before allowing them to be processed. The other concern is who is allowed to add filters to what bits, so maybe a new CRD needs thought up to restrict who can create the scripts and subsequently, where they can be applied to HTTPProxies (via Conditions in a ref?).
Hello @youngnick thanks for the response and feedback! I was hoping to get feedback to really think this through with the community and go back and forth on ideas.
I first want to address the last point about the denial of service. I can imagine if you have several engineers that are working in your k8s environment, you wouldn't want someone applying a config that could bring down the entire cluster. My thought on this is we could maybe create a separate CRD for the HTTP filters (much like Istio) and allow RBAC policies to take care of that. So if your other engineers are working with different service account (SA) credentials, you can add the RBAC policies for their SA to create a Contour HTTPFilter custom resource only if you have validated the config that they are publishing. Let me know thoughts there.
In regards to the first point of invalid configs, I was talking to @stevesloka about this and I think we have 2 issues: 1) Compile-time issue (envoy crashes due to incorrect config) and 2) Run-time issue where envoy doesn't crash because it is a valid config but then the filter doesn't do what the developer intended because of bad code. So this is tough, the question I think is "how much rope should we give our users to hang themselves with". So my honest thoughts and opinions on that matter is I would rather have the option to do what I need to do than not, and as a kubernetes engineer it's my responsibility to ensure I'm doing things correctly in my cluster at the end of the day. I feel, for example, Microsoft tried to do this with their Visual Studio platform for decades, trying to hide the details of building and deploying code. Then after 20 years developers organically developed and migrated to Visual Studio Code because it is light-weight and gives the developers the capabilities they need, even though developers may have issues they have to deal with because they are responsible for their own build processes now. If we abstract too much as a project and prevent developers from doing things out of fear they may apply incorrect configs, then we're forcing them to find hacks around the problem which is why I created the issue in the first place. For me it wasn't an option of "Lua filters seem nice, so I guess I'll try them", it was really a 2-3 month process of looking through envoy docs, envoy code, contour docs, contour code, attending meetings and identifying a solution to my problem as quickly as possible. I initially started working on hacks like injecting a bootstrap service and updating envoy configs alongside contour (which is way worse lol). So what we've had to do instead is fork Contour and implement those Lua filters in the contour code and build and deploy our own contour image for now because these feature are necessary really to the success of what we're doing even as a company (if we can't get these features, we can't survive as a company). So given that, we will definitely work to find anyway we can to accomplish this. So I think having these features available as a standard managed feature, even if it can break envoy, is better than forcing the community to find hacks and/or forking and managing their own forks of the project.
So I think maybe the conversation is a bit more philisophical about the design philosophies of the Contour project rather than this specific issue. My concern is if we barr developers too much from doing what they need to do and don't at least offer these avenues, people will eventually either create (worse) hacks to solve their problems or fork (like we unfortunately have had to do for now). I don't know if this particular github issue is a good venue for this conversation but would love to hear from the community on this as a philosphy of the contour project. Thanks!
@stevesloka
I think we could also make the lua bits an op-in for users
I think that's a great idea as well! Create a gauge of "how much risk are you willing to accept for your particular contour deployment" might be a great way to strike that balance!
+1 for opt-in
@ccravens Thanks for the detailed response!
You're right that this comes to the heart of the philosophy for Contour. How much configurability do we allow - or more bluntly - how much do we trust our users to not make mistakes? The thing I've always tried to keep in mind is that everyone makes mistakes, and we need to do our best to ensure that the inevitable mistakes will produce the least worst outcome possible. How flexible I've been willing to be on this has been something that I've needed to change as I came to understand more about the ways in which Contour was actually being used. I used to be much more conservative about this than I am now, but have come to understand that because the proxy is used as a fixup point for many other behaviors, we have to allow more that we used to.
I still think there's value in not allowing things that will almost always be bad - but I think that there's definitely value in having this greater flexibility available, and it may well be worth the cost, especially as an escape valve you can open if you really need it.
The converse is that with greater flexibility comes greater risk. There are three risks I am worried about here:
I don't want you to have to maintain a fork to meet your usecases, but I also want you to be able to help us build something that solves the problems you have for everyone. I'm worried that once Lua is available, people will go off and build their own Lua solutions, and not bring feature requests here.
It seems like if we were to have an opt-in, it needs to use this escape valve idea. It's intended for you to open it to keep you going while you come back and get your usecase added into Contour proper, at which time you retire your Lua and move back to vanilla. The main way to encourage this is in the documentation and probably a log message to remind people to bring the usecase back. WDYT?
I'm interested in wider thoughts on this and other topics from other operators here.
Resurrecting this to understand it a bit more
the risk that Lua scripts passed to filters are broken and break your Contour install, with no way to know that except for seeing things fail.
That’s a risk with anything that plugs in and runs inline. If we support this at some point, I also don’t think it’s contour’s job to do code validation on lua scripts. I think our position would need to be you run it at your own risk and we’ll do our best to surface the errors to the users through logs. Not sure if there are static code analysis tools of lua scripts but probably overkill as part of contour. That said, I do think being able to manipulate headers in incoming requests very useful. How about we table this for just a bit now
lastly, and most importantly for the future of Contour, once we have this, there will be much less incentive for users to log feature requests. Both of the use cases you mentioned in your opening comment could be (and in my opinion, really should be) features added to the main HTTPProxy document. I'd like to have a way to say to people - "Hey, you're using Lua to do something cool, have you thought about asking for that in HTTPProxy?"
I don’t grasp the point here about less incentive to log feature requests if we deliver LUA support. Can you expand?
That’s a risk with anything that plugs in and runs inline. If we support this at some point, I also don’t think it’s contour’s job to do code validation on lua scripts. I think our position would need to be you run it at your own risk and we’ll do our best to surface the errors to the users through logs. Not sure if there are static code analysis tools of lua scripts but probably overkill as part of contour. That said, I do think being able to manipulate headers in incoming requests very useful. How about we table this for just a bit now
Agreed, but until we solve #1176, if we send a bad Lua script to Envoy, it could NACK the update, and then there will be no more updates. Once we have NACK support, then I'd be willing to proceed in the way you're talking about.
lastly, and most importantly for the future of Contour, once we have this, there will be much less incentive for users to log feature requests. Both of the use cases you mentioned in your opening comment could be (and in my opinion, really should be) features added to the main HTTPProxy document. I'd like to have a way to say to people - "Hey, you're using Lua to do something cool, have you thought about asking for that in HTTPProxy?"
I don’t grasp the point here about less incentive to log feature requests if we deliver LUA support. Can you expand?
What I'm worried about is that people will solve their own problems with Lua, and never tell us they were problems, and we will miss out on implementing features in Contour that would be better suited in the mainline project, rather than someone's custom Lua.
@youngnick
What I'm worried about is that people will solve their own problems with Lua, and never tell us they were problems, and we will miss out on implementing features in Contour that would be better suited in the mainline project, rather than someone's custom Lua.
Then it would be best to remove all custom Lua code support. If you allow users to run custom code, then they will run custom code.
Agreed, but until we solve #1176, if we send a bad Lua script to Envoy, it could NACK the update, and then there will be no more updates. Once we have NACK support, then I'd be willing to proceed in the way you're talking about.
In my personal opinion, this should be documented, but permitted. It can even be gated behind a feature flag / environment variable like CUSTOM_LUA_MAY_BREAK_ENVOY
.
Personally, Contour's behavior here comes across as nanny-like to me. Why can I not break my own envoy? Given the choice of forking Contour with our own changes (which we had to do) or using a good alternative, I would prefer to not use Contour at all due to this sort of attitude towards downstream users.
edit1: I would also like to mention that foreseeing all possible use cases is impossible, and shouldn't really be the scope of contour. This is precisely what things like custom Lua code are for.
Thanks for the comment, @andrewzah. There's quite a bit to respond to, so I'll quote-and-reply.
Then it would be best to remove all custom Lua code support. If you allow users to run custom code, then they will run custom code.
Look, you're not wrong, and I guess I'm being too precious here. We should be encouraging people to tell us what they're fixing, not making it so they can't do their fixes. But I would really like a way to encourage people to bring us the things they're doing with Lua once we implement it. I'll think more about this, but I withdraw this objection.
In my personal opinion, this should be documented, but permitted. It can even be gated behind a feature flag / environment variable like CUSTOM_LUA_MAY_BREAK_ENVOY.
The problem with this break of Envoy is that it's silent. You'll have no way of knowing if you've triggered it without checking to see that your Envoy is taking changes. I'm firmly of the opinion that #1176 is one of Contour's most serious issues, but fixing it is non-trivial (which is why it's not fixed already).
As the lead maintainer, my main concern is about support. What would be your expectation about support in the event you're using the feature flag? I'm certainly not Lua-fluent, and I don't think we have heaps of Lua fluency in the maintainer team, so we wouldn't be able to offer good support to help people troubleshoot their Lua scripts.
It doesn't seem like it would be good to have to tell people "Oh, you're using Lua? Sorry I can't help you" either.
Personally, Contour's behavior here comes across as nanny-like to me. Why can I not break my own envoy? Given the choice of forking Contour with our own changes (which we had to do) or using a good alternative, I would prefer to not use Contour at all due to this sort of attitude towards downstream users.
I've been working pretty hard to make Contour less nanny-like, and more configurable, but we do have to support whatever we add into the product. I understand it's frustrating to not get the feature you need, but as maintainers we have to keep up with the future of the project, not just any one feature. Looks like here I haven't done a good job though, and for that I apologise.
Next, you and I may be comfortable with breaking our Envoy, but how do we ensure that we're protecting people that aren't? I don't know if a feature flag is enough, maybe we need a full explanation of what behavior you may see if something goes wrong, and link to that from the docs about the feature flag.
I'm also really sorry that you found it necessary to fork Contour, I'd like to hear more from you about the reasons why you did, and if what we could do to bring you back inline with upstream. Please feel free to contact me one-on-one either on the email listed in my profile, or on Kubernetes slack @youngnick.
The tl;dr here is not "We're never going to do this", but "I really want to understand what people's expectations are before we do it". Like always in open source, no is temporary, but yes is forever.
I think this is super important to figure out, however, I don't feel like we're moving this issue in any direction after looking through the responses.
It seems like there are a few options to solve this problem:
Steps to move forward:
I think that we can take the presented problems and add them as features, yes. @andrewzah has logged #3657 to cover the location header rewrite, and we already have #2965 covering adding Samesite
to cookie attributes.
That doesn't solve the larger "Can we have custom Lua" problem, but does avoid all of my objections. :) It feels like the pattern here could be "We need this feature, seems like Lua would do it, here's what we recommend" as the way to add features with Lua. If Lua is an implementation detail, not the whole feature, then the concerns about supportability for the wider community don't arise.
In terms of the xDS work, #1176 covers the wider work here. I did some investigations a little while ago, and the problem is now not as bad as it was - if Envoy rejects config updates, it will recover once you fix the config that's being rejected. We still have quite poor support for actually telling what config failed however, which @wrowe will be working on soon. I'm hoping that we can get a better feedback loop for xDS errors, which will make me much more comfortable about being more risky with sending config that might be invalid to Envoy.
I think one way to achieve the balance is that we allow LUA but it would be unsupported by Contour team. These are treated as 3rd party plugins. In the future, they can be submitted for official support but this mechanism is non-existent right now. But there's still a lot of risk with this and we are evaluating other approaches as well. WASM is another alternative.
I'll add another example here, this is not strictly speaking lua code, a collection of multiple conditions. Hope there is some help.
(AND (IN HOST www.google.com google.com) (EQ URL /search) (EQ SRC_IP 114.114.114.114) (OR (RANGE HEADER uid 100 999) (RANGE HEADER uid 10000 11000)))
We can have Lua support, and have it be not supported by the Contour team, but we would need to validate that sending a broken Lua to Envoy doesn't produce an xDS NACK, because that has bad consequences.
Remember, those consequences look like this from a user perspective:
I'll be honest, it could only be some updates not getting to Envoy (probably updates to the HCM in question), but that's almost worse, as it will be even harder to find.
Until there is some way for users to know that the Lua they've provided isn't working, I think that providing this is just too risky.
This is not just a matter of me not wanting to have to support Lua, it's that I don't want advanced users who turn on this functionality to have the ability to silently break their Contour installation by having an invalid Lua config.
I can definitely see the utility in having an extensible way for knowledgeable users to add onto Contour, but until we can ensure that we're not producing an unsolvable problem for them in the event they make a mistake, I'm pretty worried about it.
@youngnick I'm not sure about the reasoning here. If the users are experienced enough to enable this one, they should be able to debug it. Right now, AFAIU, there is no way to supply a custom Envoy config at all?
So, essentially envoy features like WASM filters (let's be frank, Lua just doesn't cut it) are barred?
ExtensionService
could be a good alternative, but because it cannot alter responses or even react to them (unless you go the "copy the whole body into memory" route, which is a no go) its usability is pretty limited...
Seems like forking Contour is the only way to proceed here (ex. #399 ), pretty sad.
@pkit I'll be honest, I'm much more hopeful for WASM filters in this regard, as they are a bit less open-ended than Lua.
You are correct that there's no way to supply custom Envoy config, because of the problem listed above.
We (Contour) still have tidying up to do before we can do WASM either, because of the problem I listed above. Forking Contour to add the ability to configure Lua won't help you troubleshoot a broken Lua, because Contour isn't capable of telling you that it's broken at the moment. Yes, this is a terrible state to be in, and we are trying to fix it, but I can't say "sure we'll just give you this" right now, because then your users can silently break your entire Contour install through no fault of their own. This isn't just "the Lua you pass won't work", it's no changes in the cluster will go to Contour's Envoys.
If you do decide to fork, I understand. But please make sure you're careful about the Lua that's supplied to Contour.
There was a couple of unfortunate vulnerabilities in NGINX couple of months ago, that could be interesting for us too, for learning purposes.
CVE-2023-5043 and CVE-2023-5044 were related to user injecting malicious Lua and config snippets https://www.armosec.io/blog/cve-2023-5043-nginx-ingress/. The last one CVE-2022-4886 was related to path sanitization that accidentally allowed injecting config snippets, leading into remote code execution https://hackerone.com/reports/1620702.
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack
The Contour project currently lacks enough contributors to adequately respond to all Issues.
This bot triages Issues according to the following rules:
You can:
Please send feedback to the #contour channel in the Kubernetes Slack
I'm requesting the support of Envoy Lua filters with Contour. Lua filters provide developers the ability to apply sometimes complex and necessary modifications to headers for security or functional purposes that may not be supported via Contour YAML-based declarative manifests.
Use Cases
I have 2 use cases in which Lua filters would solve my problem that is currently not possible with the current Contour manifests:
Location Header Re-Write
While Contour offers the ability to set or remove headers, there is currently no functionality to modify headers. In my case, I need to guarantee that any
Location
header redirects tohttps
rather thanhttp
. This problem is well described and documented in this issue: https://github.com/projectcontour/contour/issues/2776The following inline Lua code applied as a filter would solve this problem:
SameSite Cookie Attribute
Even more complex than header modification, updated security constraints in the Chrome browser do not allow setting iframe cookies without the proper SameSite attribute. While a lot of applications are working to apply this, it would be best to enforce this requirement at the proxy level to guarantee that this works across all applications (https://github.com/envoyproxy/envoy/issues/12970).
The following inline Lua code applied as a filter would solve this problem:
Workaround
Initially, I was working to inject a separate XDS configuration service that would apply Lua filters to the configuration objects that Contour creates in envoy. However, due to Contour regenerating the DAG everytime a configuration update is made and re-applying the entire DAG to envoy, it would over-write any manual configurations that were applied. Therefore, the only way that we can truly get Lua filters properly supported is if they are managed by Contour.
Sample Solutions
Istio supports an
EnvoyFilter
CRD which supports Lua inline code as well (https://istio.io/latest/docs/reference/config/networking/envoy-filter/). The following is a sample of an Istio CRD with inline Lua code: