hafe commented 1 year ago

WIP

Summary

NIC is currently (3.x) required to run as a privileged POD with added capabilities. This is not ideal from a security perspective and not aligned with best practice container security guidelines and standards such as:

CIS Kubernetes as briefly explained by Aquasec
NSA Kubernetes Hardening guide

To improve the security posture, NIC should be able to run with the restricted POD security level. See Pod Security Standards for more information.

Motivation

NIC is usually exposed to the Internet and thus a target for all kinds of attacks. The project should always strive to improve the security of NIC.

Goals

Secure by default
Restricted security level in deployment resources

Non-goals

Other security improvements not required by the restricted level such as read-only root filesystem (#1677 )

Proposal

TBD

github-actions[bot] commented 1 year ago

Hi @hafe thanks for reporting!

Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:

Cheers!

sigv commented 1 year ago

Please leave default HTTP/S ports as defaults.

Most users of the Ingress Controller will want ports 80 and 443 used. This would particularly impact those who expose NIC to the Internet, as port change would result in users having to type https://nginx-ingress.com:8443/

Docker v20.10.0 (released 2020-12-08) can be seen as supporting binding privileged ports with no capabilities, as it automatically sets sysctl net.ipv4.ip_unprivileged_port_start=0 via https://github.com/moby/moby/pull/41030 merged as https://github.com/moby/moby/commit/888da28d42f7d0f9fa250dd8a75d51c2a6cf3098.

Similarly, K8s docs state that net.ipv4.ip_unprivileged_port_start is considered a safe sysctl since Kubernetes v1.22. PodSpec could contain securityContext.sysctls with { "name": "net.ipv4.ip_unprivileged_port_start", "value": "0" }

The safe sysctl set, based on documentation, should be namespaced and not interfere with other Pods or the node. This to me implies that with Kubernetes, even if host network is used, the sysctl should be safe to specify.

I have not yet experimented with this.. but a general solution, setting the sysctl, sounds more elegant:

remove NET_BIND_SERVICE capability for Pod,
change allowPrivilegeEscalation to false for Pod,
remove the setcap from binary.

This would however leave containers running in host network in a peculiar situation. The mentioned Docker runtime change avoids setting the sysctl for host networking (We do not set network sysctls if network namespace is host) as that ends up changing the native host sysctl when outside of namespace.

Such scenario sounds like an issue for people who run Docker natively on host and want to use Nginx Ingress Controller. Is this a supported use-case?

sigv commented 1 year ago

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

hafe commented 1 year ago

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

I guess not. It is hard to tell when you can't test. Cap net bind seems to be allowed - https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

hafe commented 1 year ago

So what is the reason allowPrivilegeEscalation is set to start with?

sigv commented 1 year ago

Allowing Privilege Escalation is done to support NET_BIND_SERVICE in Nginx process when Entrypoint (Ingress Controller) doesn't have it as Permitted/Effective. (Previous discussion: https://github.com/nginxinc/kubernetes-ingress/issues/1677#issuecomment-866499090)

hafe commented 1 year ago

Yes that makes sense from my testing. When I changed to high ports I could remove allowPrivilegeEscalation

sigv commented 1 year ago

@hafe, how would proposal in #3573 look to you?

sigv commented 1 year ago

Alternative approach could be that we set cap_net_bind_service=+ep on /nginx-ingress itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.

hafe commented 1 year ago

Alternative approach could be that we set cap_net_bind_service=+ep on /nginx-ingress itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.

Sounds like a good approach!

sigv commented 1 year ago

As part of the linked PR, it was identified that the underlying process does not drop NET_BIND_SERVICE once that capability is effective.

Therefore, on existing approach even if privilege escalation is restricted, code execution could in an attack chain result in binding the low ports.

Instead, nginxinc/kubernetes-ingress@8be01446762dcaae9a2916b3d59ca78c6ca5670f applies the sysctl change to remove privilege requirement (and in effect remove the Escalation use case).

For the scope of Kubernetes policy, this complies (as it's a safe sysctl that individual pods can obtain since K8s v1.22). In future, someone may want to pick up the task to re-implement the Capability, and add proper bind+drop.

You should be able to experiment on the main branch to collect information on what other policy changes are needed. Keep in mind you will have to locally build the image, and cannot rely on previous release's image.

blurpy commented 1 year ago

@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation?

This must also be set in the security context to comply:

seccompProfile:
  type: RuntimeDefault

Warning: existing pods in namespace "nginx-ingress" violate the new PodSecurity enforce level "restricted:latest" Warning: nginx-nginx-ingress-7f55b6c8d4-zdhv8: allowPrivilegeEscalation != false, seccompProfile

hafe commented 1 year ago

I like this! I write something like a requirement and others implement 😁

sigv commented 1 year ago

This must also be set in the security context to comply:
seccompProfile:
  type: RuntimeDefault

@blurpy, would you mind opening a Pull Request for this with you as author?

blurpy commented 1 year ago

@blurpy, would you mind opening a Pull Request for this with you as author?

I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.

sigv commented 1 year ago

I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.

Opened #3629 on your behalf.

jasonwilliams14 commented 1 year ago

@sigv thank you for the PR. We going to review it on our side. If we need anything or have any questions, will update your PR thread.

sigv commented 1 year ago

@blurpy, the edge version (latest main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?

blurpy commented 1 year ago

@blurpy, the edge version (latest main) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?

Excellent, thank you! I will have to get back to you on that.

hafe commented 1 year ago

I finally got some time to play with latest main. Nice work with all security improvements lately! What I found out is that running with UID 101 is now what is stopping using the restricted policy

sigv commented 1 year ago

What I found out is that running with UID 101 is now what is stopping using the restricted policy

@hafe, where are you seeing UID 101 being an issue? Is there some reference document/source you could link to? My understanding that any non-zero UID should be okay, based on current Pod Security Standards wording:

Running as Non-root user (v1.23+) Containers must not set runAsUser to 0

Restricted Fields:

spec.securityContext.runAsUser

spec.containers[*].securityContext.runAsUser

spec.initContainers[*].securityContext.runAsUser

spec.ephemeralContainers[*].securityContext.runAsUser

Allowed Values

any non-zero value

undefined/null

hafe commented 1 year ago

OKD/Openshift gives each namespace a UID range and allocates a random UID from that to a pod. If you need a fixed UID you need to use the anyuid policy or a custom one. But I need to play more with this

sigv commented 1 year ago

Understood, it's about OpenShift's restricted-v2 security context constraint (restricted for OpenShift v4.10 and older). The restricted SCC: [..] Requires that a pod is run as a user in a pre-allocated range of UIDs

This needs a further investigation. I have not worked hands on with OpenShift so I am not 100% familiar with their approach. RedHat Blog has A Guide to OpenShift and UIDs which seems like a decent entrypoint into the topic.

@hafe, if you instead apply the anyuid SCC, how does it look? The documentation linked above says it provides all features of the restricted SCC, but allows users to run with any UID and any GID.

hafe commented 1 year ago

I will do some more checks but otherwise I think this particular issue could be rephrased and closed

sigv commented 1 year ago

I am taking a closer look and PodSecurityContext (v1) says runAsUser defaults to user specified in image metadata if unspecified. OpenShift's Example security context constraints section discusses when no explicit user ID is provided as well.

@hafe, could you check restricted-v2 by setting runAsUser: null, if you have a moment? Proposed diff available in #3665. OpenShift admission plugin should check openshift.io/sa.scc.uid-range and assign the first UID, if I am reading this right.

blurpy commented 1 year ago

@sigv Initial testing looks good! We have the controller running with the restricted profile now.

sigv commented 1 year ago

@blurpy, just to double check, as there are competing request scopes: Kubernetes restricted with latest release, or OpenShift modifying 'run as user'?

blurpy commented 1 year ago

@sigv using the restricted pod security standard: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted

The namespace is configured as follows:

kubectl describe ns nginx-ingress
Name:         nginx-ingress
Labels:       kubernetes.io/metadata.name=nginx-ingress
              pod-security.kubernetes.io/audit=restricted
              pod-security.kubernetes.io/audit-version=latest
              pod-security.kubernetes.io/enforce=restricted
              pod-security.kubernetes.io/enforce-version=latest
              pod-security.kubernetes.io/warn=restricted
              pod-security.kubernetes.io/warn-version=latest
Annotations:  <none>

Still at the poc stage though, so perhaps we discover something more later.

hafe commented 1 year ago

I originally thought about changing the templates so high (>1024) ports optionally could be configured. This could then be used with normal (non-host) networking and removing the need for privileged mode. I never got around to test. Is this a feasible path?

sigv commented 1 year ago

@hafe, are you on OpenShift 4.11+? Or are you running on the 4.10 Maintenance Support?

I am asking as I want to hear if you have the restricted-v2 SCC available.

hafe commented 1 year ago

I can/will test with OpenShift 4.11/12 but just haven't got time for that yet

hafe commented 1 year ago

@sigv I cannot get it to work with restricted and OpenShift 4.12. The reason seems to be the runAsUser directive. With runAsUser it is stopped by the admission controller. Without runAsUser it can't start, the USER directive in the image does seems to take effect.

sigv commented 1 year ago

When running restricted you get a generated UID/GID for the Controller Pods. GID 0 is assigned as supplemental to enable file operations. To support this in the generated Ingress Controller images, #3962 is pending review.

Another discussion about log files is pending. After these concerns are resolved, it should be possible to remove the hard-coded UID and switch to restricted-v2 SCC.

@hafe, will let you know when the relevant changes are in main for you to test them out. Thanks for the patience!

hafe commented 1 year ago

Sounds promising! However nginx with app-protect is also pending to be solved

brianehlert commented 2 weeks ago

Is this still relative? Should this have been closed?

hafe commented 2 weeks ago

I think binding to low ports like 80 & 443 is a problem with the restricted SCC. Would be nice if the ports by default we're >1024 and configurable

vepatel commented 1 week ago

Hi @hafe would you mind creating a new issue for this and we can close this one, also we now provide user with ability to completely modify security context and ports so not sure if this should be implemented. Let us know what you think!

nginxinc / kubernetes-ingress

NIC should be able to run with the "restricted" POD security level #3544

Summary

Motivation

Goals

Non-goals

Proposal