Open gusfcarvalho opened 5 months ago
Hey @gusfcarvalho
I want to try reproducing this issue, can you inform me about the scale of the cluster like number of nodes and number of pods/node. So that we can replicate this scenario, because this is not reproducible in our normal test clusters.
If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.
Thanks!
we see this issue with a cluster with 8/9 nodes; running about 60 pods across 10 namespaces. We also see it on bigger clusters as well.
The main pain point is that it works on a 'fresh' pod - it takes about a few days or so for the issue to kick in (so I would expect it to not be seen on any e2e tests)
If by chance possible, Can you redact sensitive information and share logs for kubearmor pods.
sure! this issue is so recurring that's actually easy to fetch them. Do you need logs from kubearmor-bpf-containerd
only?
from what I can see, pod logs only contain several of:
2024-06-11 05:24:33.556758 INFO Detected a Pod (deleted/<>)
2024-06-11 05:24:42.843611 INFO Successfully deleted visibility map with key={PidNS:40265*** MntNS:40265***} from the kernel
2024-06-11 05:24:42.850623 INFO Detected a container (<>)
2024-06-11 05:25:46.128971 INFO Updating container rules for aeb***
2024-06-11 05:25:47.422789 INFO Detected a Pod (modified/<>)
the only information that I can see is performing wrong is with karmor
. My guess is somehow the keys are getting mismatched.
This issue is still persisting. Any updates? Feel free to dm me in kubernetes slack on @gusfcarvalho
Hello! 😄 any updates on this?
I'm having a similar issue. I have a policy with a selector matching specific labels within a specific namespace. Two days later, I began to see relay alert logs pointing to policy violations in a different namespace, and on containers with no labels matching the selector. I'm running kubearmor v1.4.0
, and on 17 nodes at his moment. On AWS EKS.
I deleted all the pods in the kube armor namespace, and some hours later, it was already misapplying policies.
A very simple policy:
apiVersion: security.kubearmor.com/v1
kind: KubeArmorPolicy
metadata:
name: generic-maint-tools-access
namespace: myapplication
spec:
action: Audit
message: Restricted maintenance tool access attempt detected
process:
matchDirectories:
- dir: /sbin/
recursive: true
- dir: /usr/sbin/
recursive: true
selector:
matchLabels:
security: generic
severity: 1
tags:
- PCI_DSS
- MITRE
- MITRE_T1553_Subvert_Trust_Controls
After recreating the pods, all bpf-containerd pods log this:
Detected a Security Policy (added/myapplication/generic-maint-tools-access)
However, a few hours later, I get these logs:
{
...
"NamespaceName": "anothernamespace",
"Owner":
{ "Ref": "StatefulSet", "Name": "somests", "Namespace": "anothernamespace" },
"PodName": "somepod-2",
"Labels": "...",
"ContainerID": "...",
"ContainerName": "somecontainername",
"ContainerImage": "...",
"HostPPID": ...,
"HostPID": ...,
"PPID": ...,
"PID": ...,
"UID": ...,
"ParentProcessName": "...",
"ProcessName": "...",
"PolicyName": "generic-maint-tools-access",
"Severity": "1",
"Tags": "PCI_DSS,MITRE,MITRE_T1553_Subvert_Trust_Controls",
"ATags": ["PCI_DSS", "MITRE", "MITRE_T1553_Subvert_Trust_Controls"],
"Message": "Restricted maintenance tool access attempt detected",
"Type": "MatchedPolicy",
"Source": "....",
"Operation": "Process",
"Resource": "...",
"Data": "syscall=SYS_EXECVE",
"Enforcer": "eBPF Monitor",
"Action": "Audit",
"Result": "Passed",
"Cwd": "/",
...
}
I have no other policy named generic-maint-tools-access
in any other namespace, or cluster policy named like that.
It looks like the bug is right here?
It adds the policy to the return if this returns true || matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the matchExpressions
is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) to NamespaceList
of all existing policies (cluster or not), it then returns true and the policy is added to the GetSecurityPolicies(..)
response.
It appears that over time, as matchClusterSecurityPolicyRule(..)
is called, the list of NamespaceList
in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.
Should then matchClusterSecurityPolicyRule(..)
check for the policy type, and also receive the namespaceName
to match agaisnt the cluster policy, and fix the updating of NamespaceList
?
I may be missing something.
It looks like the bug is right here?
It adds the policy to the return if this returns true
|| matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when the
matchExpressions
is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) toNamespaceList
of all existing policies (cluster or not), it then returns true and the policy is added to theGetSecurityPolicies(..)
response. It appears that over time, asmatchClusterSecurityPolicyRule(..)
is called, the list ofNamespaceList
in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior.Should then
matchClusterSecurityPolicyRule(..)
check for the policy type, and also receive thenamespaceName
to match agaisnt the cluster policy, and fix the updating ofNamespaceList
?I may be missing something.
Actually, for what I can understand from CreateSecurityPolicy(..)
and how it is used, the cluster policies are built already with the NamespaceList
properly initialized, so the function matchClusterSecurityPolicyRule(policy)
could be just removed, and let the || kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName)
already present do the matching.
I'm trying to test that change but I'm having issues with the build using the Dockerfile
: bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory
... Hoping the maintainers can chime in soon.
It looks like the bug is right here? https://github.com/kubearmor/KubeArmor/blob/7e7b1c390316970b42ddfc13b725187ef0c4b9f1/KubeArmor/core/kubeUpdate.go#L1016
It adds the policy to the return if this returns true
|| matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when thematchExpressions
is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) toNamespaceList
of all existing policies (cluster or not), it then returns true and the policy is added to theGetSecurityPolicies(..)
response. It appears that over time, asmatchClusterSecurityPolicyRule(..)
is called, the list ofNamespaceList
in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior. Should thenmatchClusterSecurityPolicyRule(..)
check for the policy type, and also receive thenamespaceName
to match agaisnt the cluster policy, and fix the updating ofNamespaceList
? I may be missing something.Actually, for what I can understand from
CreateSecurityPolicy(..)
and how it is used, the cluster policies are built already with theNamespaceList
properly initialized, so the functionmatchClusterSecurityPolicyRule(policy)
could be just removed, and let the|| kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName)
already present do the matching.I'm trying to test that change but I'm having issues with the build using the
Dockerfile
:bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory
... Hoping the maintainers can chime in soon.
I passed the build issues and created an image with this proposed fix, and deployed it internally. I'll report back if that solves my scenario.
It looks like the bug is right here? https://github.com/kubearmor/KubeArmor/blob/7e7b1c390316970b42ddfc13b725187ef0c4b9f1/KubeArmor/core/kubeUpdate.go#L1016
It adds the policy to the return if this returns true
|| matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when thematchExpressions
is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) toNamespaceList
of all existing policies (cluster or not), it then returns true and the policy is added to theGetSecurityPolicies(..)
response. It appears that over time, asmatchClusterSecurityPolicyRule(..)
is called, the list ofNamespaceList
in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior. Should thenmatchClusterSecurityPolicyRule(..)
check for the policy type, and also receive thenamespaceName
to match agaisnt the cluster policy, and fix the updating ofNamespaceList
? I may be missing something.Actually, for what I can understand from
CreateSecurityPolicy(..)
and how it is used, the cluster policies are built already with theNamespaceList
properly initialized, so the functionmatchClusterSecurityPolicyRule(policy)
could be just removed, and let the|| kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName)
already present do the matching.I'm trying to test that change but I'm having issues with the build using the
Dockerfile
:bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory
... Hoping the maintainers can chime in soon.
Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check matchClusterSecurityPolicyRule
bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update the NamespaceList
when a new namespace is created after applying the cluster policy. In the case of NotIn
operator it is important to update the NamespaceList
if a new namespace is added/created.
Thus we will not recommend to remove the check, rather add a condition in matchClusterSecurityPolicyRule
to check if the policy is of ksp type, if it is then we will return false
early.
BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using v1.3.4
which does not have these changes, these are being added later and are available in v1.4.0
It looks like the bug is right here? https://github.com/kubearmor/KubeArmor/blob/7e7b1c390316970b42ddfc13b725187ef0c4b9f1/KubeArmor/core/kubeUpdate.go#L1016
It adds the policy to the return if this returns true
|| matchClusterSecurityPolicyRule(policy)
...but that function doesn't check whether the passed policy is a cluster policy, and when thematchExpressions
is empty, it ends up adding one namespace (whatever comes back in the k8s client response first that hasn't been added yet) toNamespaceList
of all existing policies (cluster or not), it then returns true and the policy is added to theGetSecurityPolicies(..)
response. It appears that over time, asmatchClusterSecurityPolicyRule(..)
is called, the list ofNamespaceList
in each policy keeps increasing, by one added ns at a time, which can explain the fact that for @gusfcarvalho it takes time to see this behavior. Should thenmatchClusterSecurityPolicyRule(..)
check for the policy type, and also receive thenamespaceName
to match agaisnt the cluster policy, and fix the updating ofNamespaceList
? I may be missing something.Actually, for what I can understand from
CreateSecurityPolicy(..)
and how it is used, the cluster policies are built already with theNamespaceList
properly initialized, so the functionmatchClusterSecurityPolicyRule(policy)
could be just removed, and let the|| kl.ContainsElement(policy.Spec.Selector.NamespaceList, namespaceName)
already present do the matching. I'm trying to test that change but I'm having issues with the build using theDockerfile
:bpf.c:28:10: fatal error: asm/unistd.h: No such file or directory
... Hoping the maintainers can chime in soon.Hey @carlosrodfern thnx for the detailed explanation here. You are correct there is bug in this check
matchClusterSecurityPolicyRule
bcz it is getting executed for ksp(KubeArmorSecurityPolicy) as well and it will indeed return true in the case you mentioned. But we cannot remove this check because there is no namespace watcher exists, which will update theNamespaceList
when a new namespace is created after applying the cluster policy. In the case ofNotIn
operator it is important to update theNamespaceList
if a new namespace is added/created.Thus we will not recommend to remove the check, rather add a condition in
matchClusterSecurityPolicyRule
to check if the policy is of ksp type, if it is then we will returnfalse
early.BTW on a side note I don't think this was the bug that was causing the problem for @gusfcarvalho, bcz he is using
v1.3.4
which does not have these changes, these are being added later and are available inv1.4.0
Thank you @Prateeknandle for looking into this. I'll be creating a separate issue and correcting the PR.
Bug Report
General Information
Issue
This is a consistent issue when running kubearmor on any long-lived cluster. We have a set of policies in
protected-namespace
where we whitelist only a few pods based on labels:Config
Symptom
A pod from
unprotected-namespace
cannot run due to not being able to runcorrect-binary: permission-denied
Extra information
With this configuration, after a time with kubearmor running on cluster, kubearmor starts to deny applications in
unprotected-namespaces
as well, even though there are noKubearmorPolicy
objects on these namespace:From karmor logs, I could see:
This is recurring on any clsuter we have kubearmor operator running. As a workaround, if we
kubectl delete pods --all -n <kubearmor-namespace>
, the system goes back to running as expected (until a few days/weeks later, the issue restarts).Versions
Kubearmor:
v1.3.4
Cluster: gke1.27.13-gke.1000000
Expected behavior
I would expect kubearmor to consistently get the information of the correct pods' names and namespaces :).