Reconciling OAM traits in v1alpha2

negz commented 4 years ago

Background

My understanding is that currently in the Rudr implementation of the v1alpha1 OAM spec all knowledge of how to render an ApplicationConfiguration into into Kubernetes specific kinds (e.g. Deployment and Service) is centralised in one controller (or rather instigator). I don't speak Rust, but it appears that this controller:

Watches for ApplicationConfiguration
Gets each ComponentSchematic referenced by the ApplicationConfiguration
Creates, updates, or deletes Kubernetes resources as appropriate to satisfy the workload described by the ComponentSchematic. For example by rendering and creating a Deployment, Service, etc.
Creates, updates, or supplements the Kubernetes resources produced in step 3 per the traits associated with the component that produced them.

This allows a lot of flexibility because one controller is authoritative for everything to do with an ApplicationConfiguration, but could impact the maintainability and extensibility of the code. The exec method of the instigator is ~300 lines long, and support for new kinds of workloads and traits must be compiled into the instigator.

In the nascent Crossplane implementation of the v1alpha2 OAM spec an ApplicationConfiguration is reconciled by a series of controllers. The ApplicationConfiguration controller:

Watches for ApplicationConfiguration.
Gets each Component referenced by the `ApplicationConfiguration
Renders the Component into a Kubernetes CR representing its workload kind, e.g. ContainerizedWorkload.

At this point the job of reconciling the OAM workload is passed onto a workload-specific controller. For example a ContainerizedWorkload controller might:

Watch for ContainerizedWorkload
Create a Deployment and a Service for each ContainerizedWorkload

Note that the ApplicationConfiguration controller is, in a way, completely workload-unaware. It simply renders out a templated workload and defers the task of figuring out how to reconcile that workload into another controller.

Currently there is not a lot of clarity around how traits will be supported under this new approach. It's not possible (or, in my opinion, desirable) to have the ApplicationConfiguration controller parse and act on traits as the current Rudr implementation does. The ApplicationConfiguration controller doesn't know that a ContainerizedWorkload produces a Deployment, so it cannot know that it should modify the spec.replicas of a Deployment when it encounters a ManualScalerTrait in the spec.components[0].traits array of an ApplicationConfiguration.

Proposal

I recommend that the ApplicationConfiguration controller instantiate each trait it encounters and create a reference from that trait CR to the workload CR that was produced by its component. For example given the below Component and ApplicationConfiguration:

---
apiVersion: core.oam.dev/v1alpha2
kind: Component
metadata:
  name: example-component
spec:
  workload:
    apiVersion: core.oam.dev/v1alpha2
    kind: ContainerizedWorkload
    spec:
      containers:
      - name: example
        image: example:latest
  parameters:
  - name: instance-name
    required: true
    fieldPaths:
    - metadata.name
---
apiVersion: core.oam.dev/v1alpha2
kind: ApplicationConfiguration
metadata:
  name: example-appconfig
spec:
  components:
  - componentName: example-component
    parameterValues:
    - name: instance-name
      value: example-workload
    traits:
    - trait:
        apiVersion: core.oam.dev/v1alpha2
        kind: ManualScalerTrait
        metadata:
          name:  example-trait
        spec:
          replicaCount: 3

The ApplicationConfiguration controller would:

Render and create a ContainerizedWorkload CR named example-workload.
Render and create a ManualScalerTrait CR named example-trait.

The rendered ManualScalerTrait CR would contain a reference to the workload it applies to, for example:

apiVersion: core.oam.dev/v1alpha2
kind: ManualScalerTrait
metadata:
  name:  example-trait
spec:
  replicaCount: 3
  workloadRef:
    apiVersion: core.oam.dev/v1alpha2
    kind: ContainerizedWorkload
    name: example-workload

This would allow us to encapsulate the logic of how to reconcile each kind of trait into one or more trait controllers. For example it would be possible to use a watch predicate to write a small controller that watched for instances of ManualScalerTrait whose spec.workloadRef.kind was ContainerizedWorkload. I believe this is a reasonable approach because it:

Decentralises the logic required to process each kind of trait.
Allows the ApplicationConfiguration and workload controllers (like ContainerizedWorkload) to be trait-unaware. For example a trait could apply to a Crossplane RedisCluster, or a KNative Service even if those CRs and controllers were completely OAM unaware.

negz commented 4 years ago

Note that I don't necessarily think that we need spec.workloadRef to be part of the OAM spec - it could be a Crossplane/Kubernetes specific implementation detail, per https://github.com/oam-dev/spec/issues/324.

hongchaodeng commented 4 years ago

I totally agree with this. Ideally there should be a way in Kubernetes to "traverse up" the object ownerRef tree. I was trying to do that long time ago in upstream and didn't get it through :(

zhxu2 commented 4 years ago

Will CRUD operations over Trait CRs & workload CRs result in components/app reconciliations? It seems to me the Rudr implementations only watches for Appcfg changes.

resouer commented 4 years ago

Will CRUD operations over Trait CRs & workload CRs result in components/app reconciliations? It seems to me the Rudr implementations only watches for Appcfg changes.

@zhxu2 No, the AppCfg is "source of truth" for OAM app. Also, it's not encouraged for end users to manipulate CRs (the second layer objects) in practice, OAM objects are the user facing interface. This will be enforced by OAM admission in my mind.

zhxu2 commented 4 years ago

Will CRUD operations over Trait CRs & workload CRs result in components/app reconciliations? It seems to me the Rudr implementations only watches for Appcfg changes.

@zhxu2 No, the AppCfg is "source of truth" for OAM app. Also, it's not encouraged for end users to manipulate CRs (the second layer objects) in practice, OAM objects are the user facing interface. This will be enforced by OAM admission in my mind.

For traits, I agree. For components (and thus its workload CR), would it make more sense that an update reconciles all its containing apps? Otherwise we probably need to introduce the concepts of component versions/histories for tracking in between a lot of applications

negz commented 4 years ago

For components (and thus its workload CR), would it make more sense that an update reconciles all its containing apps?

We could enqueue a reconcile for all ApplicationConfigurations that referenced a Component when the component changed if there were a reference from the Component to the various ApplicationConfiguration CRs that referenced it. Currently we don't have any such reference. That said, we do still enqueue a reconcile for the ApplicationConfiguration every 30-60 seconds, and thus process every Component it references every 30-60 seconds.

resouer commented 4 years ago

Exactly, (I recalled I brought up this once in Seattle meeting?) either reverse ref or revision ref is what we need to consider in Component workload model as follow up proposal. It's currently missing and also a blocker for a better OAM app upgrade story.

ryanzhang-oss commented 4 years ago

I wonder why is this necessary in a spec repro? It seems to be an implementation detail.

negz commented 4 years ago

I wonder why is this necessary in a spec repro? It seems to be an implementation detail.

@ryanzhang-oss I raised the issue here because the topic has come up in 2-3 different places and I wanted a single place to capture the discussion in more detail. I think this touches on the spec a little (e.g. it's an instance of #324), but I'm happy to move the issue to https://github.com/crossplane/crossplane instead.

let's move to #325

@zhxu2 I don't think this is related to #325. Were you thinking of #324?

negz commented 4 years ago

Will CRUD operations over Trait CRs & workload CRs result in components/app reconciliations? It seems to me the Rudr implementations only watches for Appcfg changes.

@zhxu2 Yes, it would be possible for a user to CRUD (for example) a ContainerizedWorkload and a ManualScalerTrait without ever writing an ApplicationConfiguration. This is similar to how a user could write a Kubernetes Pod or ReplicaSet rather than a Deployment. While it is possible it's not something we'd be recommending or documenting.

This will be enforced by OAM admission in my mind.

I don't think we should enforce this, especially considering that some OAM workload kinds will be common existing CRs that might be used in other contexts without OAM (e.g. Crossplane RedisCluster, KNative Service).

resouer commented 4 years ago

I don't think we should enforce this, especially considering that some OAM workload kinds will be common existing CRs that might be used in other contexts without OAM (e.g. Crossplane RedisCluster, KNative Service).

@negz I don't get it. I think every object in component.workload section could be used in any context without OAM, any difference in your mind?

wonderflow commented 4 years ago

Note that I don't necessarily think that we need spec.workloadRef to be part of the OAM spec - it could be a Crossplane/Kubernetes specific implementation detail, per #324.

Yeah, I agree. And I think we some other ways to do the same thing, for example ownerReference. The information of workloadRef needed by trait already contain in AppConfig, so it's not necessarily needed in Spec. And I totally agree that runtime could support fields that are not defined by the spec, this is the freedom of runtime, we have no way to restrict.

I don't think we should enforce this, especially considering that some OAM workload kinds will be common existing CRs that might be used in other contexts without OAM (e.g. Crossplane RedisCluster, KNative Service).

In my view, if some workloads(CRs) are created by OAM, then it should only be controlled by AppConfig. If CRs are created by other controller which has no relationship with OAM, then they could control. If some CRs are created by OAM while users changed them directly, we should reconcile them back.

negz commented 4 years ago

@negz I don't get it. I think every object in component.workload section could be used in any context without OAM, any difference in your mind?

@resouer I think we're on the same page here. I agree that it's possible that an OAM workload could also be a valid Kubernetes resource outside of OAM. For example an infrastructure operator may choose to make a Knative Service a valid OAM workload by creating a WorkloadDefinition. If they did this they may still want some folks to be able to author a Knative Service CR directly, without ever writing an ApplicationConfiguration or generally using OAM in any way.

Also, it's not encouraged for end users to manipulate CRs (the second layer objects) in practice, OAM objects are the user facing interface. This will be enforced by OAM admission in my mind.

I might have misunderstood what you were saying here. I thought you were suggesting we use an admission control webhook to prevent folks from directly CRUDing any Kubernetes resource that was an OAM workload, which would for example prevent a user from creating a Knative Service (except via an ApplicationConfiguration) if the infrastructure operator had created a WorkloadDefinition for Knative Services.

Note that under the current Crossplane ApplicationConfiguration controller implementation the controller will eventually undo any manual edits to the workloads or traits it creates. On each reconcile it patches all workloads and traits to match its desired configuration.

resouer commented 4 years ago

For example an infrastructure operator may choose to make a Knative Service a valid OAM workload by creating a WorkloadDefinition. If they did this they may still want some folks to be able to author a Knative Service CR directly, without ever writing an ApplicationConfiguration or generally using OAM in any way.

@negz this is the point I didn't get, why Infra Ops claim knative service as OAM Workload but choose to skip AppConfig in the platform?

Note that in this case, the Dev will use knative service to describe application (no change for his user experience). While what is "invisible" to him is the platform (let's say Crossplane) will generate AppConfig for this Workload (optional, generate Component) and apply Traits according to annotations on the CR (e.g. autoscaling).

That is said, the platform will still use OAM AppConfig as source of truth. And in order to make it work, we'll need admission hook to intercept operations of knative service CR so to avoid it triggers reaction of knative controller immediately,

This is what I named "Traits Only Mode of OAM platform" 😄 .

negz commented 4 years ago

While what is "invisible" to him is the platform (let's say Crossplane) will generate AppConfig for this Workload (optional, generate Component) and apply Traits according to annotations on the CR (e.g. autoscaling).

It sounds like you're proposing that if a kind of Kubernetes custom resource has a corresponding WorkloadDefinition stating that it's an OAM workload we'll let folks author instances of that custom resource, but then magically wrap them in an OAM ApplicationConfiguration? It sounds like this would require the person who authors the Kubernetes custom resource (e.g. the Knative Service) to be aware of and interested in using OAM, because they'd need to annotate it in order to use traits. If I'm understanding correctly, I'm not sure why we'd support this. If the person authoring the Service wants to use OAM enough to annotate their CR with OAM traits, couldn't they just write an ApplicationConfiguration instead?

I think there are cases in which some teams within a particular Kubernetes control plane are going to want to use OAM and others are going to be completely ignorant of it. There's also a case in which teams will want to slowly migrate from whatever their current patterns are to using OAM. So I think we need to support the case in which a particular kind of Kubernetes resource is a valid OAM workload, but can also be used without any interference from OAM.

resouer commented 4 years ago

It sounds like this would require the person who authors the Kubernetes custom resource (e.g. the Knative Service) to be aware of and interested in using OAM, because they'd need to annotate it in order to use traits.

@negz No, they are not using annotations to replace Traits, they still claim raw Knative annotations and the platform will translate them into Traits. I'd assume Knative service author is 100% Dev, not Ops (i.e. Serverless Architecture), and OAM platform (e.g. Crossplane) will fill the gap of Ops by Traits and AppConfig.

So the experience of this user is 100% no change. But this bring us (platform builders) many benefits, for example, the platform could now extend Knative annotations to support way more advanced operational capbilities besides scaling, also, the platform could generate cloud services Components like RDS in the AppConfig for Knative service Component to consume. This is how the OAM platform can change Knative into a Cloud Run in my mind.

This "Traits Only" mode will be helpful when certain capability in K8s already provided Workload/Component abstraction (serveless projects like Knative, FaaS are best examples).

But, I just figured out it's different from what you are discussing (I'd blame the knative service example is misleading me) 😂.

So back to the topic, I tend to agree with you theoretically, but I think RedisCluster or Knative service may not be the best example since they could be modeled well in many ways (it's also our intention, right?).

In practical, what I am proposing is as long as there's WorkloadDefiniton, there's no reason to skip AppConfig. This will ensure AppConfig is the source of truth and makes the platform simpler and less confusion. Of course, we could do this as recommendation in the doc but I just personally think Crossplane could follow this principle.

negz commented 4 years ago

what I am proposing is as long as there's WorkloadDefiniton, there's no reason to skip AppConfig

Maybe it would be helpful put together a tiny design sketch to illustrate what you're thinking of? Or to discuss this during a community meeting at some point? I'm not sure if we're talking about the same thing, but I'm pretty sure a WorkloadDefinition should mean "some people using this API server want to use this as an OAM workload", rather than "everyone using this API server must use this as an OAM workload", even if the fact that it's an OAM workload is hidden. Ideally Kubernetes CRs could be used as an OAM workload, and also be used without any interference from or interaction with OAM.

resouer commented 4 years ago

but I'm pretty sure a WorkloadDefinition should mean "some people using this API server want to use this as an OAM workload", rather than "everyone using this API server must use this as an OAM workload"

I agree on this point. I think I mis-read your idea as we should allow WorkloadDefinition/TraitDefinition be used without AppConfig which seems so wired. It's totally fine if the user want to initiate a CR whose CRD was claimed as workload/trait def'n as long as the platform ensure OwnerRef is properly set in OAM path.

And yes, this should not encouraged as you mentioned before.

negz commented 4 years ago

I think I mis-read your idea as we should allow WorkloadDefinition/TraitDefinition be used without AppConfig which seems so wired.

I see! I did not mean to suggest that, and agree it doesn't make sense. When I say "a workload" I mean (for example) a instance of CR kind: ContainerizedWorkload, not an instance of CR kind: WorkloadDefinition.

as long as the platform ensure OwnerRef is properly set in OAM path.

Definitely. Currently the ApplicationConfiguration controller sets the ApplicationConfiguration as the controller reference of any workload (e.g. ContainerizedWorkload) it creates, and will refuse to operate on any workload that is either missing a controller reference, or controlled by a different UID.

zhxu2 commented 4 years ago

I totally agree with this. Ideally there should be a way in Kubernetes to "traverse up" the object ownerRef tree. I was trying to do that long time ago in upstream and didn't get it through :(

I think what @negz proposing here makes sense although it's an implementation detail rather than a SPEC thing. Having owner references will help with the creation, reconciliation as well as cleanups of k8s resources

resouer commented 4 years ago

@negz I'm revisiting the current implementation for this. So our current solution is build a ref from Trait to Workload.

An alternative: isn't a Component actually another owner of its Trait?

So maybe we can build a reference from Component to Workload (e.g. workload.name in Component annotation). Then all the traits can use this path to track Workload CR instead of introducing a system generated field in spec.

resouer commented 4 years ago

@negz Never mind, if #336 is addressed, then we would always need a ref from Trait to Revision (aka, Workload).

I propose to make it part of OAM spec.

resouer commented 4 years ago

Close since we already have workloadRef in OAM Kubernetes Runtime.

oam-dev / spec

Reconciling OAM traits in v1alpha2 #323

Background

Proposal