istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.94k stars 7.76k forks source link

Adopt CEL validation #46151

Open howardjohn opened 1 year ago

howardjohn commented 1 year ago

This issue tracks a design to adopt CEL in CRD validation: https://kubernetes.io/docs/reference/using-api/cel/.

This is in need of a design; please do not submit a PR without a design.

Scope:


Current summary:

Blocker to any adoption

Open questions

Risks

Blockers to complete replacement

These will be needed before we can fully remove the webhook with 100% feature parity. I don't expect this list to ever be fully complete. 100% feature parity is not necessarily a strict requirement; at some point we will have enough parity that the gaps are not worth the complexity of the webhook.

howardjohn commented 1 year ago

Exploring existing checks. https://playcel.undistro.io/ https://github.com/google/cel-spec/blob/master/doc/langdef.md

VirtualService

delegate check - fine

appliesToMesh vs appliesToGateway - I think fine, may be expensive though

validateGatewayNames => has warnings but otherwise fine

validateJWTClaimRoute => complex but probably ok

duplicate hosts = ok but N^2

validateHTTPRouteConflict => ok validateHTTPRouteMatchRequest => ok ValidateHTTPHeaderName => ok validateStringMatchRegexp => "is regexp", see below ValidateHTTPHeaderWithAuthorityOperationName => ok ValidateHTTPHeaderValue => partial regex, ok. Then we have some parser... might be fine, definitely complex though validateCORSPolicy => ok validateHTTPFaultInjectionAbort => ok validateHTTPFaultInjectionDelay => ok validateSniHost => ok ValidateIPSubnet => already discussed validateDestination => ok validateExportTo => complex and N^2 but probably ok

Gateway

validateNamespaceSlashWildcardHostname => complex but seems fine

validateServerBind => ok validateTLSOptions => ok

WasmPlugin

validateWorkloadSelector => seems fine validateWasmPluginURL => fine validateWasmPluginSHA => fine validateWasmPluginVMConfig => fine validateWasmPluginMatch => fine

ALL COMPATIBLE

ValidateAuthorizationPolicy

validateWorkloadSelector => ok CUSTOM check => ok Various From checks => ok Various To checks => ok, I think. We need to parse a number to an int though Various When checks => super complex... but likely fine

RequestAuthentication

ParseJwksURI => URL parsing jwk.Parse => unlikely to work ValidateHTTPHeaderValue => complex, see other mention

PeerAuthentication

No problem

Telemetry

validateWorkloadSelector => fine validateTelemetryProviders => fine validateTelemetryMetrics => fine validateTelemetryTracing => fine validateTelemetryFilter => CEL parsing... I dont think we can do CEL parsing in CEL itself (?)

SOME GAPS

ProxyConfig

validateWorkloadSelector => fine validateConcurrency => fine

ALL COMPATIBLE

WorkloadEntry

Some warnings

ValidateUnixAddress => fine Validate IP addr => see below ValidateFQDN => can be done with regex validate label selector => fine ValidatePortName => fine ValidatePort => fine

MOSTLY COMPATIBLE

ServiceEntry

validateAlphaWorkloadSelector => fine Has a Warning for backwards compat ValidateWildcardDomain => fine validatePartialWildCard => fine, is a warning ValidateIPSubnet => see below ports => check duplicate, fine. ValidatePortName => fine ValidateProtocol => fine endpoints => see WorkloadEntry, mostly fine validateExportTo => not sure, seems likely but complex

MOSTLY COMPATIBLE

EnvoyFilter

validateAlphaWorkloadSelector => fine rexegp compile => no, see below validateListenerMatchName => fine BuildXDSObjectFromStruct => NO validateDeprecatedFilterTypes => no chance validateMissingTypedConfigFilterTypes => no chance

NOT COMPATIBLE, unlikely to ever become so. This requires unmarshalling protobuf, complex protobuf reflection, and compiling regex

DestinationRule

ValidateWildcardDomain => fine validateTrafficPolicy => validateOutlierDetection => ok ValidateDuration => ok ValidatePercent => ok some warnings validateConnectionPool => ok validateLoadBalancer => ok validateLocalityLbSetting => NOT SURE validateTLS => ok validateTunnelSettings => ok

Overall

jpbetz commented 1 year ago

cc @TristonianJones for visibility-- the need for a "isRegex" and a "isCEL" checks are both interesting

jpbetz commented 1 year ago

Missing ability to validate a CEL expression in CEL

@howardjohn Can you point me to where I could learn more about this use case? I'd like to try to understand it better.

howardjohn commented 1 year ago

@howardjohn Can you point me to where I could learn more about this use case? I'd like to try to understand it better

https://istio.io/latest/docs/reference/config/telemetry/#AccessLogging-Filter is the API, https://github.com/istio/istio/blob/2cc72adf87261df6294ad25c915246a92ea8d96e/pkg/config/validation/validation_istiod.go#L28 is the current webhook validation

TristonianJones commented 1 year ago

@howardjohn @jpbetz I have toyed around with the idea of validating that a field is a CEL expression. You can absolutely do it, but the check itself is difficult to constrain from a compute / memory standpoint since it will lift ANTLR up into the runtime which is generally something we've avoided doing. As a custom function, it's possible.

The other thought would be to make it possible for schema's to identify the type of the field as a CEL expression and perform validation on the syntax (to start) under the covers. I think that would likely be a more powerful model since it allows the system to better control the resources spent validating such fields.

For regex validation, that should be possible to expose as a custom function in one of the CEL canonical extensions.

liggitt commented 1 year ago

yeah, runtime CEL evaluation makes me very nervous.

For regex validation, that should be possible to expose as a custom function in one of the CEL canonical extensions.

one thing to watch out for there is regex flavors... CEL's view of whether a regex is valid or not would be tied to go's regex implementation, which may or may not be the same as the one in the consumer of the type (for istio, that would be fine, but for arbitrary consumers, it may not be)

jpbetz commented 1 year ago

Re: "is regex" check: I tested a suggestion @liggitt gave me and it worked out reasonably well: https://gist.github.com/jpbetz/f01f0d21055407f7a9701ca51198c395

The error message produced when the regex string is invalid looks reasonable to me. The use of a regex "findAll" to test if the regex is valid is a bit clunky/hacky, but it does seems to work.

jpbetz commented 1 year ago

yeah, runtime CEL evaluation makes me very nervous.

It could be s slippery slope. If we support "is this CEL?", do we eventually need to support "Is this valid CEL with only legal variables functions used?" or "Is this valid CEL with proper type checking of all declared variables?"

@howardjohn Do you have a sense what you'll do in the short term? Would you validate in a webhook or defer checking till a controller evaluates the CEL expression?

howardjohn commented 1 year ago

@jpbetz we will probably incrementally add CEL in addition to the webhook for a while, rather than completely replace, especially to not require us to move everything in one shot.

Long term it would be good to get rid of the webhook though; some checks we will never be able to (the protobuf stuff) - but we can make those part of optional webhook or defer the checking maybe.

The "Is CEL" check is something I wouldn't mind deferring the checking of; its a 'nice to have' not required

jpbetz commented 1 year ago

Long term it would be good to get rid of the webhook though; some checks we will never be able to (the protobuf stuff) - but we can make those part of optional webhook or defer the checking maybe.

The "Is CEL" check is something I wouldn't mind deferring the checking of; its a 'nice to have' not required

For CEL I agree, deferring checking is probably OK. There is always the possibility of runtime errors, so there is a need to communicate that back somehow (logs, metrics, status fields). A validation check reduces the classes of errors that are deferred to runtime but can't eliminate them entirely.

The protobuf cases are probably out of our hands.

Looks like we should be able to support most everything else?

~Warnings => OK (ValidatingAdmissionPolicy) How do we check duplicates? => N^2 check (you may need to set maxItems on some lists) Missing ability to validate a CEL expression in CEL => No support. Missing ability to validate a regexp expression in CEL => https://gist.github.com/jpbetz/f01f0d21055407f7a9701ca51198c395 sufficient? Validate IP address => Can you do something like https://github.com/kubernetes-sigs/gateway-api/pull/2246#discussion_r1278225456 - Gateway API was able to achieve conditional IP validation this way..

howardjohn commented 1 year ago

@jpbetz I analyzed the last of our resources (before had left a few off) and summarized my view of the current state in the original comment: https://github.com/istio/istio/issues/46151.

Based on this I think we can get >90% of our rules in CEL easily (actually, probably 50% of them are just checking things are not empty which is simple OpenAPI...). It looks like even among the 10% we can probably get creative to handle things.

We will incremental adopt, so the last X% will not block us from moving towards CEL, just may delay us removing the webhook entirely. We will likely never get to 100% parity which is fine -- at some point we can make the decision to remove the webhook and do validation at runtime.

The only real blocker we have from starting adoption is purely an Istio-side issue so we will work around that.

Appreciate all the help here and the great work to enable this!

howardjohn commented 1 year ago

A major blocker came up: https://github.com/kubernetes/kubernetes/issues/120973

zirain commented 10 months ago

not stale