kubewarden / policy-server

Webhook server that evaluates WebAssembly policies to validate Kubernetes requests
https://kubewarden.io
Apache License 2.0
138 stars 18 forks source link

policy-server seems to pull cached OCI artifacts regardless of hot cache #883

Open viccuad opened 1 month ago

viccuad commented 1 month ago

Deploying Kubewarden with Audit Scanner enabled, and configured to run every 2 minutes, Deploying verify-image-signatures policy configured to verify Application Collection images following the instructions in https://github.com/kubewarden/docs/pull/443,

It seems that the PolicyServer still exercises the OCI registry instead of consuming from its cache, when calling: https://github.com/kubewarden/policy-evaluator/blob/3cd66b932b199037e677e3e204d4d9742e23edc8/src/callback_handler/sigstore_verification.rs#L251-L266

Acceptance criteria

Verify that policy-server cache for context-aware calls is correctly configured. Configure the cache in policy-evaluator as needed. Add tests as needed.

flavio commented 3 weeks ago

I've tested the code. Everything is working as described:

If a container image is not signed, getting its signature will fail. Hence whenever a workload uses an unsigned image we will keep reaching to the remote registry until a signature blob is found.

In the setup described above, the audit-scanner performs an assessment every 2 minutes. That means the cache is always empty when the scanner is initiated. However, if multiple workloads are using the same image, the remote registry is interrogated only once. However, don't forget the cache is specific to the policy-server instance. When running multiple policy server instances, each one of them will reach out to the registry for the same image; but each one will do that only once.

We could provide a configuration knob that sets the cache expiration time.

@recena: do you have any opinion? I know the potential bug was reported by you.

flavio commented 3 weeks ago

Moving to blocked, waiting for feedback

recena commented 3 weeks ago

I'm not sure If I understand the scenario, but:

  1. We should cache valid images → signed
  2. 1 minute for TTL is too short
flavio commented 2 weeks ago

We're caching the valid images, but we expire the cache after 1 minute. That's because someone in the meantime might overwrite a tag.

For example:

Right now we're conservative, being a security project, and we let the cache expire after 1 minute.

I think we should allow the user to configure the cache expiration time. In this way the user could define a value that is the right tradeoff between the two cases (talking too much with a registry vs having stale data).

flavio commented 1 week ago

We're going to refine this card as part of 1.18, and work on this improvement during 1.19.

I would like to come up with a solution that allows the policy to configure the caching interval, so that the k8s admin can put a value that makes him comfortable