Externalize provider specific specs and status in separated CRDs

pablochacin commented 5 years ago

/kind feature

Describe the solution you'd like Presently, Cluster and Machine CRDs include an opaque representation of provider specific description of these resources and their status:

ProviderSpec ProviderSpec `json:"providerSpec,omitempty"`

An alternative approach would be to use CRDs for provider specific specs and status and keep a ObjectReference

ProviderSpec metav1.ObjectReference

With respect of the provider specific status, the ClusterAPI controllers should watch the provider CRD and update the ClusterAPI CRD status if a change is detected.

Pros:

Explicit visibility of the dependency to providers by means of the object reference
Independent reconcile cycle for provider specific and ClusterAPI CRDs
Supporting tools like clusterctlcould use a plugin approach for handling provider specific specs and status

Cons:

Node identified so far.

Anything else you would like to add:

This approach was discussed in the context of #733
A similar approach has been used in metalkube
Using a CRDs may substitute the idea of a generic provider based on webhooks, which has the inconvenience of the synchronous invocation to webhook.

ncdc commented 5 years ago

@pablochacin would you be willing to defer discussion on this while we plan out the organization we discussed in today's community around roadmap items, and using KEPs for things like this instead of individual issues?

pablochacin commented 5 years ago

@ncdc Sure. My intention with this issue was to put together ideas that I had expressed as comments in other places (issues, documents) and facilitate the discussion.

timothysc commented 5 years ago

/assign @vincepri @detiber

mhrivnak commented 5 years ago

I love a lot of things about this approach and think it could be very successful. Some thoughts:

Considering a provider "Foo", I would make a FooMachine CRD with FooMachineSpec and FooMachineStatus types, just like any other normal CRD. Is that what you are proposing? Then from a Machine I would reference a FooMachine resource via an ObjectReference. I'm not sure there is value in splitting FooMachine up into dedicated "Spec" and "Status" CRDs that themselves don't really have spec and status.

Enabling the cluster-api MachineController to watch FooMachine CRDs and queue their owner, a Machine, would I think cover most use cases. Is there a use case you can think of for having a FooMachine controller? From a controller design standpoint, I lean toward just reconciling the Machine, and letting that workflow utilize the FooMachine as necessary. Ditto for Cluster of course.

It's not clear to me how best to handle this from a MachineSet. How would we templatize the FooMachine? We could have a MachineSet reference its own copy of a FooMachine, and have it make a copy of that for each Machine it creates. Other ideas?

Lastly, I'll just correct the perception of how metalkube is using an extra CRD, because it's fairly different from this proposal. (I'm a main author of that provider). We did not make a CRD to substitute for ProviderSpec and ProviderStatus. We did make a CRD that does provisioning and hardware management. The BareMetalHost CRD results in an API that you can think of as the local cloud API. Our actuator, rather than talking to the AWS or GCP cloud APIs, instead talks to the BareMetalHost API. Hopefully that helps; in any case, I don't think metalkube is doing anything similar to this proposal.

pablochacin commented 5 years ago

@mhrivnak thanks for the clarification regarding how metakube uses the CRD.

pablochacin commented 5 years ago

@mhrivnak Your suggestion about using a generic controller watching the provider CRD seams interesting. I don't know if you are participating in the discussion about extension mechanism for the cluster-api. That would be a good place to present and discuss the idea. Interested?

vincepri commented 5 years ago

/area api

pablochacin commented 5 years ago

@vincepri @timothysc I would like to work on this issue (seems I cannot assign to my self, can I?)

However, I have two questions:

This issue is previous to the work towards v1alpha2, so there's significant overlapping with the work on Machine Statues and Boostrapping.
v1alpha2 doesn't targets any change in the cluster api/data model.

ncdc commented 5 years ago

@pablochacin would you be willing to write up a proposal for the changes to the Cluster type to switch from inline to object references? I imagine this would also include removing the cluster Actuator interface (to match what's proposed in #997) and replacing it with a cluster infrastructure controller.

pablochacin commented 5 years ago

@ncdc Yes, I would be interested. Now, I expect the cluster data model to change significantly, according to what we discussed in the data model workstream. For instance, the references to infrastructure and control plane objects. So it's timely to do this change? If so, I'm in.

Regarding the machine part, I'm not sure how to approach this change in coordination with the proposal we have on the table.

detiber commented 5 years ago

@pablochacin I am in favor of including this for v1alpha2 assuming:

A proposal is created for it as suggested by @ncdc
We have commitment from one or more people to do the implementation

ncdc commented 5 years ago

If we want to do this in "small" steps, I'd suggest the only change we make for starters is the combination of removing the inline providerSpec and providerStatus fields (replacing them with object references to "cluster infrastructure" provider-specific CRDs, whatever they may look like per provider) and switching from a cluster Actuator to a cluster infrastructure controller. I think this would get rough alignment/coordination with #997.

A possible next step after this, maybe for v1alpha3, could be to further break up the data model into infrastructure vs control plane vs other things.

pablochacin commented 5 years ago

@ncdc sounds like a plan.

ncdc commented 5 years ago

I'm going to mark this p0 and move it to the v1alpha2 milestone as this issue covers both Machine and Cluster providerSpec/Status and the current plan is at least to tackle the fields in Machine for v1alpha2. And if we can get the proposal for the Cluster changes approved & have someone sign up to do it, 👍!

If we need to split this up so we have 1 issue for Cluster, and a separate for Machine, please let me know @timothysc.

/milestone v1alpha2 /priority critical-urgent

vincepri commented 5 years ago

/assign

timothysc commented 5 years ago

@pablochacin @ncdc - could folks please update this issue.

ncdc commented 5 years ago

Pablo to write a proposal.

ncdc commented 5 years ago

/reopen

The proposal has merged but we haven't modified the code yet. That's happening in #1177

k8s-ci-robot commented 5 years ago

@ncdc: Reopened this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/833#issuecomment-513798400): >/reopen > >The proposal has merged but we haven't modified the code yet. That's happening in #1177 Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api

Externalize provider specific specs and status in separated CRDs #833