kubernetes-sigs / aws-load-balancer-controller

A Kubernetes controller for Elastic Load Balancers
https://kubernetes-sigs.github.io/aws-load-balancer-controller/
Apache License 2.0
3.88k stars 1.44k forks source link

[Feature Request] AWS WAF WebACL should not be disassociated from ALB if there is no 'alb.ingress.kubernetes.io/wafv2-acl-arn' annotation #3780

Closed joknoxy closed 4 weeks ago

joknoxy commented 1 month ago

Customers are not always aware of the 'wafv2-acl-arn' annotation for an AWS Load Balancer Controller and so they manually associate an AWS WAF WebACL with their ingress ALB, however by default, if there is no 'alb.ingress.kubernetes.io/wafv2-acl-arn' annotation, the controller will disassociate any existing (manually) associated WebACL from the ALB, leaving it unprotected by AWS WAF. This should not be the default behaviour as:

Refer documentation - https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.7/guide/ingress/annotations/#wafv2-acl-arn

Solution: if there is no 'alb.ingress.kubernetes.io/wafv2-acl-arn' annotation, leave any existing WebACL associated with the ALB in place

oliviassss commented 1 month ago

@joknoxy, hi, thanks for reaching out, this is a behavior by designed, since the controller should reconcile based on manifest. Would you be able to share your use-case in order for us to understand further? Update: I think the initial purpose of the behavior is to let the controller know which wafACL to disassociate/deprecate. If it's not specified via the annotation, the controller can disassociate it. Otherwise, the webACL may will just hang there.

M00nF1sh commented 1 month ago

TLDR, i think it's a good feature and we should do the change in a minor version. The impact shall be minimal(existing ingresses won't be impacted), thus we may/may not do the change under a feature flag.

Personally i'd prefer all new features to be behave like this way: if an annotation/field is not specified, keep the setting to whatever it is currently. This semantic is more scalable when we add new features(i.e. we don't need to make unnecessary aws api calls when a feature is not used).

BTW, iirc, i added this wafv2-acl-arn annotation support in the past and have chose the current behavior due to all other existing annotations have the semantic of removing annotation = reset behaviors to default and decided to remain consistent. However, overtime, we do recognize the benefits(in both performance&iam policy) by using the semantic of removing annotation = keep existing settings unchanged and thus have added a few new annotations following this new semantic.

joknoxy commented 1 month ago

Hi @oliviassss the use-case is that if there is a WebACL in place it will contain rules to protect the customers online property from vulnerability exploitation, bots and DDoS. If there is no annotation, it doesn't mean they don't want the WebACL associated, it means they didn't know about the LBC annotation behaviour. Any security protection the customer has configured needs to be left in place unless they make an explicit choice to remove it.

The current behaviour is a serious security concern.

oliviassss commented 1 month ago

@joknoxy, I'm fine with not disassociating the WebACL if there's no annotation. But it's just a bit confusing that if they need the WebACL, why would they remove the annotation that they specified earlier.

joknoxy commented 1 month ago

@oliviassss, @M00nF1sh - Customers are not aware of the AWS Load Balancer Controller (LBC) and that they can control aspects of the ALB such as the WebACL arn, using annotations. Customers create a Kubernetes ingress, see the ALB and then manually associate a WebACL with it, not being aware that they were supposed to use annotations. Then LBC then goes and removes the protection they added.

The Controller does give them a way to OptOut, If users do not want the controller to manage the waf-acl on the ALBs, they can disable the feature by setting controller command line flags --enable-waf=false or --enable-wafv2=false, however given that they are not aware of the annotations for WAF in the first place, they never find the Opt out method

In some cases, customers are not aware for weeks that the security protections have been removed, leaving them exposed.

I have edited the original issue above for clarity on this.

oliviassss commented 1 month ago

@joknoxy, thanks for the details, so they are manually attaching the WebACLs, that makes more sense for the use case now. For short-term workaround, just disable the wafv2/waf via the controller flag while we work on the enhancement here. I'm also wondering usually would the users manually manage waf addons, or they want the controller to manage them. Maybe we can turn the default setting of these flags to false. And if they want the controller to manage, they can enable explicitly.

joknoxy commented 1 month ago

I have given customer workaround. I think having the setting 'false' by default would also be valid.

jalaziz commented 1 month ago

We're running into this and we're using Firewall Manager to set a global WAF policy.

Seeing the --enable-wafv2=false is not ideal because it would prevent management in cases where we want to override the "global" WAF with a specific one.

It would be great if the controller could be smart enough not to interfere with Firewall Manager.

joknoxy commented 1 month ago

@jalaziz --enable-wafv2=false would not prevent FMS from managing the WebACL, it would only prevent the controller from managing it @oliviassss please confirm this. If you wanted the controller to manage it you could set --enable-wafv2=true

@oliviassss I've just noticed another annotation that is potentially concerning - alb.ingress.kubernetes.io/shield-advanced-protection - if the annotation is missing does that mean that the controller will not change the status of the resources Shield Advanced protection status? Or is it set to 'false' by default?

jalaziz commented 1 month ago

@jalaziz --enable-wafv2=false would not prevent FMS from managing the WebACL, it would only prevent the controller from managing it @oliviassss please confirm this. If you wanted the controller to manage it you could set --enable-wafv2=true

Yes I know. I'm saying that's not ideal because you may want to override the FMS managed ACL with a more custom one via the annotation. It's a bit of a heavy hammer when what we really want is: "If a more specific Web ACL is not specified, fallback to the FMS ACL".

joknoxy commented 1 month ago

Let's make this purely about LBC never leaving a resource unprotected unless customer has explicitly requested it, and that means making the default value --enable-wafv2=false.

Let's not muddy the waters here talking about FMS - raise a separate FR for that (and what you seem to be asking for @jalaziz is for both FMS and LBC to be aware of each other which I cannot see ever happening).

jalaziz commented 1 month ago

Let's make this purely about LBC never leaving a resource unprotected unless customer has explicitly requested it, and that means making the default value --enable-wafv2=false.

Let's not muddy the waters here talking about FMS - raise a separate FR for that (and what you seem to be asking for @jalaziz is for both FMS and LBC to be aware of each other which I cannot see ever happening).

That's not what I'm asking for at all.

The original ask was to simply not reconcile the web ACL if the annotation is not specified. That's all I'm asking for too.

If the annotation is not present, simply ignore any web ACLs. If it is, honor it. No need to be aware of FMS. Just allow the Web ACL to be managed externally for some resources and via the controller for others.

Making --enable-wafv2=false the default is in direct opposition to never leaving a resource unprotected because someone may expect the annotation to work without realizing it's not doing anything.

joknoxy commented 1 month ago

Just allow the Web ACL to be managed externally for some resources and via the controller for others. You can already do that (with the caveat that FMS may overwrite the LBC WebACL if FMS policy set to auto-remediate and resource is in scope of policy). Both enable-wafv2 and wafv2-acl-arn annotations are on a per-resource basis. You have full control.

Making --enable-wafv2=false the default is in direct opposition to never leaving a resource unprotected because someone may expect the annotation to work without realizing it's not doing anything OK I understand what you are saying- - yes it's probably a 'lower touch' change to change the behaviour when wafv2-acl-arn is empty rather than change default behaviour.

jalaziz commented 1 month ago

You can already do that (with the caveat that FMS may overwrite the LBC WebACL if FMS policy set to auto-remediate and resource is in scope of policy). Both enable-wafv2 and wafv2-acl-arn annotations are on a per-resource basis. You have full control.

Maybe I'm missing something here, but enable-wafv2 doesn't appear to be an annotation as far as I can tell looking at the code (it only appears to be a controller command-line flag). With it set to false, I believe the controller simply ignores the annotation altogether. With it set to true, if waf2-acl-arn is not set but the WebACL is set manually, the controller ends up removing the association. In the FMS case, you end up flip-flopping between the controller and FMS remediation.

In all fairness, this is my understand based on the documentation and a very cursory glance of the code base, so I may be entirely mistaken.

OK I understand what you are saying- - yes it's probably a 'lower touch' change to change the behaviour when wafv2-acl-arn is empty rather than change default behaviour.

Indeed and it should (hopefully) allow FMS, externally managed WebACLs, and LBC managed WebACLs to coexist relatively peacefully (assuming resource targeting is setup correctly).

M00nF1sh commented 1 month ago

Thanks for all the valuable discussions. I think an acceptable change will be the follows:

  1. we'll change the behavior for waf2-acl-arn annotation, thus that when it's not specified, existing wafv2 on lbs won't be examed/touched
  2. we'll add a sentinel value for the waf2-acl-arn annotation, thus if waf2-acl-arn: none is explicitly specified, the controller will ensure there is no wafv2 on the LB. (which gives user a way to clean up existing wafv2 that were setup via annotations)
  3. we'll add a feature gate to cover this behavior change(which will be default enabled, but cx can revert to old behavior by disable the feature gate). And after a few versions we'll remove this feature gate.
M00nF1sh commented 1 month ago

some discussion when we add this feature: https://github.com/kubernetes-sigs/aws-load-balancer-controller/pull/1211#issuecomment-611636675