Open hafe opened 1 year ago
Hi @hafe thanks for reporting!
Be sure to check out the docs while you wait for a human to take a look at this :slightly_smiling_face:
Cheers!
Please leave default HTTP/S ports as defaults.
Most users of the Ingress Controller will want ports 80 and 443 used. This would particularly impact those who expose NIC to the Internet, as port change would result in users having to type https://nginx-ingress.com:8443/
Docker v20.10.0 (released 2020-12-08) can be seen as supporting binding privileged ports with no capabilities, as it automatically sets sysctl net.ipv4.ip_unprivileged_port_start=0
via https://github.com/moby/moby/pull/41030 merged as https://github.com/moby/moby/commit/888da28d42f7d0f9fa250dd8a75d51c2a6cf3098.
Similarly, K8s docs state that net.ipv4.ip_unprivileged_port_start
is considered a safe sysctl since Kubernetes v1.22.
PodSpec could contain securityContext.sysctls
with { "name": "net.ipv4.ip_unprivileged_port_start", "value": "0" }
The safe sysctl set, based on documentation, should be namespaced and not interfere with other Pods or the node. This to me implies that with Kubernetes, even if host network is used, the sysctl
should be safe to specify.
I have not yet experimented with this.. but a general solution, setting the sysctl
, sounds more elegant:
NET_BIND_SERVICE
capability for Pod,allowPrivilegeEscalation
to false
for Pod,setcap
from binary.This would however leave containers running in host network in a peculiar situation. The mentioned Docker runtime change avoids setting the sysctl
for host networking (We do not set network sysctls if network namespace is host) as that ends up changing the native host sysctl
when outside of namespace.
Such scenario sounds like an issue for people who run Docker natively on host and want to use Nginx Ingress Controller. Is this a supported use-case?
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than allowPrivilegeEscalation
?
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than
allowPrivilegeEscalation
?
I guess not. It is hard to tell when you can't test. Cap net bind seems to be allowed - https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
So what is the reason allowPrivilegeEscalation is set to start with?
Allowing Privilege Escalation is done to support NET_BIND_SERVICE in Nginx process when Entrypoint (Ingress Controller) doesn't have it as Permitted/Effective. (Previous discussion: https://github.com/nginxinc/kubernetes-ingress/issues/1677#issuecomment-866499090)
Yes that makes sense from my testing. When I changed to high ports I could remove allowPrivilegeEscalation
@hafe, how would proposal in #3573 look to you?
Alternative approach could be that we set cap_net_bind_service=+ep
on /nginx-ingress
itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.
Alternative approach could be that we set
cap_net_bind_service=+ep
on/nginx-ingress
itself, so there's no "escalation" (IC process gets NET_BIND_SERVICE and then Nginx gets it too). This may be considered more preferable from a security standpoint.
Sounds like a good approach!
As part of the linked PR, it was identified that the underlying process does not drop NET_BIND_SERVICE once that capability is effective.
Therefore, on existing approach even if privilege escalation is restricted, code execution could in an attack chain result in binding the low ports.
Instead, nginxinc/kubernetes-ingress@8be01446762dcaae9a2916b3d59ca78c6ca5670f applies the sysctl change to remove privilege requirement (and in effect remove the Escalation use case).
For the scope of Kubernetes policy, this complies (as it's a safe sysctl that individual pods can obtain since K8s v1.22). In future, someone may want to pick up the task to re-implement the Capability, and add proper bind+drop.
You should be able to experiment on the main
branch to collect information on what other policy changes are needed. Keep in mind you will have to locally build the image, and cannot rely on previous release's image.
@hafe, is there any aspect in which NIC does not comply with Restricted security level, other than
allowPrivilegeEscalation
?
This must also be set in the security context to comply:
seccompProfile:
type: RuntimeDefault
Warning: existing pods in namespace "nginx-ingress" violate the new PodSecurity enforce level "restricted:latest" Warning: nginx-nginx-ingress-7f55b6c8d4-zdhv8: allowPrivilegeEscalation != false, seccompProfile
I like this! I write something like a requirement and others implement 😁
This must also be set in the security context to comply:
seccompProfile: type: RuntimeDefault
@blurpy, would you mind opening a Pull Request for this with you as author?
@blurpy, would you mind opening a Pull Request for this with you as author?
I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.
I would have liked to, but I can't prioritize it right now. I would be happy if anyone else has the time to fix it in the mean time.
Opened #3629 on your behalf.
@sigv thank you for the PR. We going to review it on our side. If we need anything or have any questions, will update your PR thread.
@blurpy, the edge version (latest main
) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?
@blurpy, the edge version (latest
main
) of the chart restricts syscalls based on runtime defaults. Could you please check if there are any other low hanging fruit that the scan picks up?
Excellent, thank you! I will have to get back to you on that.
I finally got some time to play with latest main. Nice work with all security improvements lately! What I found out is that running with UID 101 is now what is stopping using the restricted policy
What I found out is that running with UID 101 is now what is stopping using the restricted policy
@hafe, where are you seeing UID 101 being an issue? Is there some reference document/source you could link to? My understanding that any non-zero UID should be okay, based on current Pod Security Standards wording:
Running as Non-root user (v1.23+) Containers must not set
runAsUser
to 0Restricted Fields:
spec.securityContext.runAsUser
spec.containers[*].securityContext.runAsUser
spec.initContainers[*].securityContext.runAsUser
spec.ephemeralContainers[*].securityContext.runAsUser
Allowed Values
- any non-zero value
undefined
/null
OKD/Openshift gives each namespace a UID range and allocates a random UID from that to a pod. If you need a fixed UID you need to use the anyuid policy or a custom one. But I need to play more with this
Understood, it's about OpenShift's restricted-v2
security context constraint (restricted
for OpenShift v4.10 and older).
The restricted SCC: [..] Requires that a pod is run as a user in a pre-allocated range of UIDs
This needs a further investigation. I have not worked hands on with OpenShift so I am not 100% familiar with their approach. RedHat Blog has A Guide to OpenShift and UIDs which seems like a decent entrypoint into the topic.
@hafe, if you instead apply the anyuid
SCC, how does it look? The documentation linked above says it provides all features of the restricted
SCC, but allows users to run with any UID and any GID.
I will do some more checks but otherwise I think this particular issue could be rephrased and closed
I am taking a closer look and PodSecurityContext
(v1) says runAsUser
defaults to user specified in image metadata if unspecified. OpenShift's Example security context constraints section discusses when no explicit user ID is provided as well.
@hafe, could you check restricted-v2
by setting runAsUser: null
, if you have a moment? Proposed diff available in #3665.
OpenShift admission plugin should check openshift.io/sa.scc.uid-range
and assign the first UID, if I am reading this right.
@sigv Initial testing looks good! We have the controller running with the restricted profile now.
@blurpy, just to double check, as there are competing request scopes: Kubernetes restricted with latest release, or OpenShift modifying 'run as user'?
@sigv using the restricted pod security standard: https://kubernetes.io/docs/concepts/security/pod-security-standards/#restricted
The namespace is configured as follows:
kubectl describe ns nginx-ingress
Name: nginx-ingress
Labels: kubernetes.io/metadata.name=nginx-ingress
pod-security.kubernetes.io/audit=restricted
pod-security.kubernetes.io/audit-version=latest
pod-security.kubernetes.io/enforce=restricted
pod-security.kubernetes.io/enforce-version=latest
pod-security.kubernetes.io/warn=restricted
pod-security.kubernetes.io/warn-version=latest
Annotations: <none>
Still at the poc stage though, so perhaps we discover something more later.
I originally thought about changing the templates so high (>1024) ports optionally could be configured. This could then be used with normal (non-host) networking and removing the need for privileged mode. I never got around to test. Is this a feasible path?
@hafe, are you on OpenShift 4.11+? Or are you running on the 4.10 Maintenance Support?
I am asking as I want to hear if you have the restricted-v2
SCC available.
I can/will test with OpenShift 4.11/12 but just haven't got time for that yet
@sigv I cannot get it to work with restricted and OpenShift 4.12. The reason seems to be the runAsUser directive. With runAsUser it is stopped by the admission controller. Without runAsUser it can't start, the USER directive in the image does seems to take effect.
When running restricted
you get a generated UID/GID for the Controller Pods. GID 0 is assigned as supplemental to enable file operations. To support this in the generated Ingress Controller images, #3962 is pending review.
Another discussion about log files is pending. After these concerns are resolved, it should be possible to remove the hard-coded UID and switch to restricted-v2
SCC.
@hafe, will let you know when the relevant changes are in main
for you to test them out. Thanks for the patience!
Sounds promising! However nginx with app-protect is also pending to be solved
Is this still relative? Should this have been closed?
I think binding to low ports like 80 & 443 is a problem with the restricted SCC. Would be nice if the ports by default we're >1024 and configurable
WIP
Summary
NIC is currently (3.x) required to run as a privileged POD with added capabilities. This is not ideal from a security perspective and not aligned with best practice container security guidelines and standards such as:
To improve the security posture, NIC should be able to run with the restricted POD security level. See Pod Security Standards for more information.
Motivation
NIC is usually exposed to the Internet and thus a target for all kinds of attacks. The project should always strive to improve the security of NIC.
Goals
Non-goals
Proposal
TBD