Open Martin-Weiss opened 5 days ago
Hi, there's several problems here.
spec.settings
.spec.rules
is incorrect and ambiguous (how does k8s know that apiGroups
batch
applies only to cronjobs
and not pods
for example?).At first view, one could think that the reproducer was crafted from the README.md example, but the rules don't match it.
The easiest solution is to use kwctl
, which takes into account the metadata of the policy
to provide with a correct ClusterAdmissionPolicy, and craft from there. This is normally shown on artifacthub "install" button (but not for this policy as its latest release happened a while ago).
kwctl pull ghcr.io/kubewarden/policies/verify-image-signatures:v0.2.8
kwctl scaffold manifest -t ClusterAdmissionPolicy registry://ghcr.io/kubewarden/policies/cel-policy:v0.2.8
spec.rules
, make new release. https://github.com/kubewarden/verify-image-signatures/pull/113The reproducer policy has wrong and ambiguous spec.rules
. The policy would apply for example to apiGroups apps
, apiVersion v1
, resource pods
, which doesn't exist (apiGroups for Pods is the empty string).
A request for this specific GroupVersionKind would never come from the K8s API Server, but can come from the audit-scanner, as we don't use the spec.rules
until policy execution.
It would be good to validate the spec.rules
ahead of time. Yet the policy spec.rules
can't be validated on policy instantiation, as the GroupVersionKind resources available in the cluster can vary over time (with K8s upgrades, or CRDs for example). I will see if anything can be done in the audit-scanner.
If the audit-scanner sends a request to a policy that errors on execution (e.g: reproducer but with tag v0.2.8
), we currently abort the audit-scan run. We do this for security reasons; this way misconfigured policies will not silently be ignored. We could improve this UX somehow.
Thanks a lot for the analysis and the feedback. Basically I believe that the error in the audit scanner leading to a full stop of the complete cluster scan is not a good practice. Admins might not even realize that the audit-scanner pod fails to run on the scheduled plan when they adjust any of the policies.
To address this I believe we need some sort of policy validation already in the policy server and we should see the policy server to fail starting. Also keep in mind that the audit scanner is an optional functionality that even might not be enabled.
In addition to that the logs in the audit scanner do not give any good indication on which policy it is failing and why.. so from a usability point of view I guess this should be changed - the audit scan should continue to scan all other stuff as well but the reports it creates should show where something is not ok..
Indeed.
some sort of policy validation already in the policy server and we should see the policy server to fail starting
This is already the case. It happens for the reproducer provided.
On this issue, there's 2 reasons for errors on the audit scan run. Nevertheless, the audit scan run should continue gracefully. I will work on both:
spec.rules
. Here the k8s client used in audit-scanner fails and we bubble up the error to a fatal error. We should instead print the error and continue the audit scan.
Is there an existing issue for this?
Current Behavior
Using this policy (example from https://artifacthub.io/packages/kubewarden/verify-image-signatures/verify-image-signatures)
with kubewarden-controller-2.1.0 kubewarden-crds-1.5.1 kubewarden-defaults-2.0.3 ends up in audit-scanner failing fatal with the following
Expected Behavior
Audit scanner must not fail completely if a policy is not perfect ok.
Steps To Reproduce
See above - more details here:https://github.com/Martin-Weiss/rancher-fleet/tree/bb68ca7288b42dcc7ddea27d5ffabde7fe241d22/kubewarden
Environment
Anything else?
No response