Add enhancement proposal for generic constraints feature

dinhxuanvu commented 3 years ago

This enhancement outlines the design and specification for the generic constraints feature.

Signed-off-by: Vu Dinh vudinh@outlook.com

perdasilva commented 3 years ago

I'm still a bit apprehensive of the need for an expression language. If I understand correctly, the bundle properties are arbitrary key/value pairs. A constraint can then be expressed as . E.g. type==certified. With a list of these of these constraints, we can already express AND compound constraints. E.g.

constraints:
  - type==certified
  - stable==true
  - vcpus>5

This is easy to understand, doesn't require learning a new language and probably covers most (if not all?) use-cases?

Do we need OR compound constraints? Do we have some examples coming from users that say they need this? And do we want to support it? The more complex the constraint relationships, the harder it might be for a user to understand why something went wrong and how to fix it.

I'd like to see in the list of alternatives an option for no CEL and why we shouldn't do that.

dinhxuanvu commented 3 years ago

I'm still a bit apprehensive of the need for an expression language. If I understand correctly, the bundle properties are arbitrary key/value pairs. A constraint can then be expressed as . E.g. type==certified. With a list of these of these constraints, we can already express AND compound constraints. E.g.
constraints:
  - type==certified
  - stable==true
  - vcpus>5
This is easy to understand, doesn't require learning a new language and probably covers most (if not all?) use-cases?

Do we need OR compound constraints? Do we have some examples coming from users that say they need this? And do we want to support it? The more complex the constraint relationships, the harder it might be for a user to understand why something went wrong and how to fix it.

I'd like to see in the list of alternatives an option for no CEL and why we shouldn't do that.

Using your example, there are 3 lines of strings. What do you use to parse/translate those strings to be something understandable by the resolver? Does the resolver need to understand what type, ==, certified, stable, true, vcpus, > and 5 mean? In this context, all of these values are string (as inputs) but what are they actually meant when you do comparison and evaluation in term of value and type? You are literally writing expressions (type==certified is an expression) but neglect what the tool is used to evaluate these expressions. That's what CEL is for.

perdasilva commented 3 years ago

I'm still a bit apprehensive of the need for an expression language. If I understand correctly, the bundle properties are arbitrary key/value pairs. A constraint can then be expressed as . E.g. type==certified. With a list of these of these constraints, we can already express AND compound constraints. E.g.
constraints:
  - type==certified
  - stable==true
  - vcpus>5
This is easy to understand, doesn't require learning a new language and probably covers most (if not all?) use-cases? Do we need OR compound constraints? Do we have some examples coming from users that say they need this? And do we want to support it? The more complex the constraint relationships, the harder it might be for a user to understand why something went wrong and how to fix it. I'd like to see in the list of alternatives an option for no CEL and why we shouldn't do that.
Using your example, there are 3 lines of strings. What do you use to parse/translate those strings to be something understandable by the resolver? Does the resolver need to understand what type, ==, certified, stable, true, vcpus, > and 5 mean? In this context, all of these values are string (as inputs) but what are they actually meant when you do comparison and evaluation in term of value and type? You are literally writing expressions (type==certified is an expression) but neglect what the tool is used to evaluate these expressions. That's what CEL is for.

Sorry I wasn't clear. I was going to add a parenthesis after "no CEL" to explain that, obviously, we need to have some kind of expression parsing going on. The crux of the question is why not this one. Maybe to be more clear, my question is, does the language need to be so complicated that I need to read a manual on how to use it, or could we get away with simple expressions such as above? It's simple, I'm familiar with it, I don't have to parse json in my head to understand what it means, I can type less, etc.

I don't understand why the resolver needs to know what type, certified, stable, true, and vcpus are. They are just keys and (string) values, only how to compare them: why is string comparison not enough? Do we have any use-cases where it wouldn't be?

I'm just trying to understand what having such a high input complexity buys us and why we don't leverage mechanisms that are more familiar to the user - and I believe that this issue ought to be addressed in the alternatives.

dinhxuanvu commented 3 years ago

I'm still a bit apprehensive of the need for an expression language. If I understand correctly, the bundle properties are arbitrary key/value pairs. A constraint can then be expressed as . E.g. type==certified. With a list of these of these constraints, we can already express AND compound constraints. E.g.
constraints:
  - type==certified
  - stable==true
  - vcpus>5
This is easy to understand, doesn't require learning a new language and probably covers most (if not all?) use-cases? Do we need OR compound constraints? Do we have some examples coming from users that say they need this? And do we want to support it? The more complex the constraint relationships, the harder it might be for a user to understand why something went wrong and how to fix it. I'd like to see in the list of alternatives an option for no CEL and why we shouldn't do that.
Using your example, there are 3 lines of strings. What do you use to parse/translate those strings to be something understandable by the resolver? Does the resolver need to understand what type, ==, certified, stable, true, vcpus, > and 5 mean? In this context, all of these values are string (as inputs) but what are they actually meant when you do comparison and evaluation in term of value and type? You are literally writing expressions (type==certified is an expression) but neglect what the tool is used to evaluate these expressions. That's what CEL is for.
Sorry I wasn't clear. I was going to add a parenthesis after "no CEL" to explain that, obviously, we need to have some kind of expression parsing going on. The crux of the question is why not this one. Maybe to be more clear, my question is, does the language need to be so complicated that I need to read a manual on how to use it, or could we get away with simple expressions such as above? It's simple, I'm familiar with it, I don't have to parse json in my head to understand what it means, I can type less, etc.

I don't understand why the resolver needs to know what type, certified, stable, true, and vcpus are. They are just keys and (string) values, only how to compare them: why is string comparison not enough? Do we have any use-cases where it wouldn't be?

I'm just trying to understand what having such a high input complexity buys us and why we don't leverage mechanisms that are more familiar to the user - and I believe that this issue ought to be addressed in the alternatives.

You say the language CEL is complicated to understand but at the same time you are suggesting we design a custom parser/compiler to parse the string expressions and then write a documentation explaining what how it works. You merely suggest shifting the complexity from constraint syntax to the parser that we need to implement if that's the case.

More than anything, I'm suggesting starting with the user and working backwards. Tbh, I don't think we should even care about what libraries are currently available for this until we understand what we want the user interaction to be. If there are libraries that can be used to realise that, great, if not, then we DIY. I'm even happy with using cel-go under-the-hood. More than anything questioning the easy-of-use of the interface.

I'm 100% on-board with hiding as much complexity from the user as possible - and, generally, I think that's a great tenet to have.

Just imagine all of operator syntax that you need to account for >, <, >=, != and etc. Plus, you say why would we care about the type. Sure, stable is just a string but 5 is not especially with > operator. semver comparison is pretty popular. How do you propose we support that with your syntax without knowing the type? Your example doesn't contain any whitespaces. It could be stable == true and that's something we need to account for. The value in your example is just a string but imagine cases where they are struct (like your GVK property type). That's pretty much impossible to parse without in advance its type.

All great callouts - agreed.

If the intention is to have a bare minimum string and integer comparison then that's hardly generic. It is just another hardcoded type comparison that we will say "here is the list of things we can compare and operator syntax that you can use". Even with the minimum effort, the parser still needs to deal with many corner cases that may arise with a random string expression.

Again, the intention is to have usability in mind. I'm not entirely convinced we need something that's completely generic. To that point: we have identified a 5 types: gvk, semver, bool, number, string and a 6 operators: >, <, >=, <=, ==, !=. Do we have any use-case that we want to support goes beyond these? The trigger for completely generic expressions isn't exactly clear to me. From what I can gather, bundles have dependencies (packages or gvk) and can declare properties. We want to be able to tell the resolver further restrict the solution space to packages that contain particular property values.

The dependency-related features are advanced features that certainly requires some learning. The intention is to give users the freedom to express their complex dependency model using a language that supports those kinds of evaluations without us having to reinvent the wheel or having some sort of OLM specific syntax.

I don't know that I agree that dependency-related features are advanced in a package management system. I'm just saying we should as much as possible reduce the cognitive load on the user. Maybe another suggestion here would be to try to find a stakeholder that would use that today and ask them what they think, or how they would like to specify these things. Both operator authors and cluster admins (and any other that might be relevant here)

I did include another propose from k8s apiserver team that is gonna use CEL for similar purpose. I recommend you to give it a read if you haven't done so.

I had a quick look and it was very interesting. But notice how the level of complexity doesn't go up much at all when this is framed in an OpenAPI schema. It's relatively easy to grok:

...
x-kubernetes-validator: 
- rule: "minReplicas <= maxReplicas"
   message: "minReplicas cannot be larger than maxReplicas"
type: object
properties:
  minReplicas:
    type: integer
  maxReplicas:
    type: integer
 ...

perdasilva commented 3 years ago

I tend to agree with Joe here. I think this EP would definitely benefit from more and more varied examples. It may also be valuable to add in the motivation section (along with the two types of constraints) where/how are the package/bundle properties defined (maybe also an example yaml) - this could help ground the examples. I'm happy to back off from my UX concerns (rethinking now, I probably didn't have enough background then to comment usefully), but it would be nice to see something closer to the simplified version imo, e.g. either having good defaults for the evaluator and action or keep these folded in the constraint type name. I think reducing the amount of stuff people need to fill out and have them concentrate as much as possible on the what they want to express would go a long way...

joelanford commented 3 years ago

Repeating from our conversation:

I want to make sure: a) using cel is easy b) adding new constraint types later is easy (new expression-based constraints AND non-expression-based constraints) c) configuration of various constraint types is not intermingled d) OLM can identify that a property is intended to be a constraint even if it doesn't understand how to interpret the constraint

Example of what I envision: https://hackmd.io/WRRYzrqsRACdIlH7ab_oJA?view

estroz commented 3 years ago

@joelanford I like that schema you suggest. A description helps with debugging.

In your example, would any constraint under an olm.constraint that is not known via the type system result in an error? Or do you see olm.constraint as home to all constraint properties? If the former is true, given that all OLM constraints would be under their own property, it follows that any non-OLM constraints would be under a other-resolver.constraint, for example. This would make OLM resistant to incomplete resolution and therefore explicitly non-backwards-compatible with newer OLM constraint types (which I think is a benefit not a drawback). In other words, having OLM ignore unknown constraints could break installation; requiring all olm.constraint's children be known types would fix that.

joelanford commented 3 years ago

In your example, would any constraint under an olm.constraint that is not known via the type system result in an error?

Yes, because we'd unmarshal into the Constraint type and all of the pre-defined contraint structs would be nil. And our validation would look for exactly one non-nil struct.

it follows that any non-OLM constraints would be under a other-resolver.constraint, for example.

I hadn't considered the possibility of non-OLM constraints. That would require a non-OLM resolver to be running or at least for OLM to have a constraint evaluator extension point, both of which IMO are out of scope for our "defined by OLM" constraints.

In other words, having OLM ignore unknown constraints could break installation; requiring all olm.constraint's children be known types would fix that.

Exactly. And this would open the door for OLM to fail on "required" unknown constraints but ignore "optional" unknown constraints.

Not only that, with our current design of: a) arbitrary properties are allowed b) constraints are properties

OLM would necessary have to ignore unknown constraint types if they were defined as separate property types, because there would be no way for OLM to disambiguate and understand operator author intent.

estroz commented 3 years ago

both of which IMO are out of scope for our "defined by OLM" constraints.

Yep agreed, just figured I'd call it out.

Defining olm.constraints should be a prereq for this EP and #97 (as I think you said somewhere). Do you think defining olm.constraints would warrant a separate EP? I would guess yes so @dinhxuanvu doesn't have to do more work :wink:.

joelanford commented 3 years ago

Defining olm.constraints should be a prereq for this EP and #97 (as I think you said somewhere).

I think that's true if we can't think of any other mechanism to disambiguate arbitrary non-constraint properties from constraints.

Do you think defining olm.constraints would warrant a separate EP? I would guess yes so @dinhxuanvu doesn't have to do more work 😉.

The title of this EP seems to align with the need to define olm.constraint, so I'm fine with that living here. If we think it would be helpful to do another spinoff of this EP so we can make progress faster to avoid contentious topics, I'm not opposed.

joelanford commented 3 years ago

I'd go further: why not omit the evaluator entirely and identify via the type?

@ecordell (I think my comment in response to this question was eaten by one of Vu's new commits), so here goes again:

My concern is that different root types will cause version skew concerns when we add a new constraint type.

If we:

add support for olm.constraint.cel.required in 0.20
add support for olm.constraint.foobar.required in 0.21

And then a bundle contains olm.constraint.foobar.required and is attempted to be installed in <=0.20, what does OLM do with that property?

In 0.20, support for that as a known constraint is not present, so OLM would treat it like an opaque property and ignore it entirely. There's no opportunity to fail resolution (if its actually required) or ignore it on purpose (because its optional) because OLM can't tell user intent since we're co-mingling constraints and properties.

fgiloux commented 3 years ago

I like the idea of having a generic mechanism for defining constraints.

I was however quite confused with the proposal till I read again: "This enhancement proposes the adoption of Common Expression Language (CEL) to present the specification and usage of generic constraint that OLM can evaluate for dependency resolution at runtime." By generic constraints you mean ONLY in the context of dependency resolution, don't you? Why not defining a general constraint mechanism for operators that can ALSO be leveraged for dependency resolution at runtime? With the later it would be very valuable to have a clear API for surfacing properties that constraints can be based on.

Today we have these constraints:

Presence of a specific GVK in the target cluster
Presence of a specific package in the target cluster
Kubernetes version of the target cluster

These properties are recorded in multiple places:

CRDs
packages
version endpoint

Also mentioned

number of nodes
number of CPU cores
etc

There is a need for a mechanism to feed these properties to the installation decision algorithm or to the SAT solver. I am missing this part in the enhancement proposal. Here are the approaches I see:

Having OLM surfacing a finite set of properties into one or multiple custom resources that get ingested
Having OLM executing custom queries, whose result can be seen as property. The extreme extension of this approach being running any code (the Python proposal made earlier) and having constraints based on the result of the custom code How do you see that? Edited: I think that it is the intent of this enhancement proposal to address these questions. I will add my comments there. It would be good to add a link to it in the top of this enhancement proposal.

This may not have been the original intent but there may be value in separating the concepts of dependencies (the triggering of installation of additional APIs/operators) and constraints (primary or dependent operator being fit for the target cluster). Having this separation an operator may still get installed even if the dependency (API or operator) is not available as long as no constraint on them as been specified. The "optional dependency" mechanism that has been being discussed for a while.

dinhxuanvu commented 2 years ago

@joelanford Regarding your comment on failureMessage, I think we did talk about the fact we are not changing the current Property struct to support additional root-level field. That will require changes in registry side and grpc API.

kevinrizza commented 2 years ago

/approve

joelanford commented 2 years ago

/lgtm

operator-framework / enhancements

Add enhancement proposal for generic constraints feature #91