Closed enxebre closed 3 years ago
I'm not sure I fully understand all the nuances here. If I got it right, the ask is to have a sort of generic infrastructure machine controller, and then plugin the infrastructure specific bits, right? is there a description/comparative analysis between the different infrastructure providers so I can better understand what are the parts we want to re-use across infra providers. Expanding a little bit, does this effort requires a design doc/a CAEP?
@fabriziopandini This is meant to be an umbrella ticket to track a better decoupling between the providers cluster infrastructure CRs and the providers machine controllers. This decoupling enables the providers machine controllers to work with external cluster infrastructure CRs. This capability enables scenarios where cluster infra requires special treatment different from the reference implementation, e.g bring you own infra for the cluster.
See AWS implementation details https://github.com/kubernetes-sigs/cluster-api-provider-aws/pull/2124 kubernetes-sigs/cluster-api-provider-aws#2125
Azure https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/1129
Ref: https://github.com/kubernetes-sigs/cluster-api/issues/4063#issuecomment-758088482
~This is a set of solutions that should enable this use case as well. A MachineShim (or similar) would effectively decouple machines from their infrastructure part by bringing in a ProviderID (Kubernetes Node). Would that suffice?~
/kind proposal /milestone v0.4.0
A MachineShim (or similar) would effectively decouple machines from their infrastructure part
I read through the comment that was linked, and as I understand it, this is more about creating fake machine representations that allow some Machine like things (eg lifecycle hooks) to be leveraged in things like MachinePools. So there's no Machine Infrastructure in this case right?
My understanding of the use case in this issue is more that we want to be able to use the Machine Controller and Infrastructure Machine controller without using the Cluster and Cluster Infrastrcture resources/controllers. I believe this is a different problem unless I've missed some nuance of the MachineShim concept
cc @hasheddan
Me, @yastij and @detiber we're discussing #1250 and came to the realisation that we do also need a similar sort of contract for infrastructure provider load balancers vs. clusters, as in the load balancer may be provided by one infrastructure provider, e.g. AWS, but attached to a cluster using a vSphereCluster. I think then that the contract would be the same as what RH needs and we could define a more generic one that describes how InfraComponents should have a way to be instanceable without an InfraCluster.
I don't think a MachineShim is the right thing here. Concretely, the issue is that you want to provision something, a Machine or a Load Balancer, and today we know where to provision that thing, e.g. the VPC, security groups, subnets etc... based on information in the InfraCluster object. If that InfraCluster object is mismatched, then today the InfraMachine or future InfraLoadBalancer controller won't work.
@randomvariable that's right, I wonder if instead of having an additional extenalInfraCluster CR as originally attempted in AWS we could come up with a very clear definition of what infrastructure and "unmanaged" mean across providers and try to leverage "unmanaged" for this use case
Currently in AWS this is not possible as an "unmanaged" AWSCluster
CR will still try to reconcile default security groups and the API server load balancer.
@randomvariable @JoelSpeed @yastij @detiber @vincepri would you be to ok to proceed as described above and let the "unmanaged" flavour to be fully unmanaged by stop reconciling SG and APIServer load balancer? I described the use case here https://docs.google.com/document/d/1uqzpQjEQ9s0gfHppDcRa4zZeQTXGtca4v19bSYPIDgM/edit#
@enxebre What if instead of having a yet another reference added to the objects, we'd generically allow to retrieve a ClusterInfrastructure
-compatible object from a configmap or secret, or whatnot?
In other words, is there a way to have the current AWSCluster, AzureCluster, etc to function as interfaces? In that way, we have a clear, well defined contract already in place that we could use to create a bridge from other resources.
An alternative to a configmap or secret is to have a generic field or annotation that lets us create these resources, but stops the controller from reconciling them.
Can we also consider allowing machines to be 'clusterless'? For use cases where infrastructure is provided by the user, machines can stop relying on clusters and read values like subnets, SG, regions, etc. from machine template?
Possibly, I'd like to hear from the infrastructure provider maintainers on that bit though, because it increases the support scope
@CecileRobertMichon @devigned @randomvariable @sedefsavas @yastij @srm09
As a side benefit, having a generic data interface would let us welcome other infrastructure based managers like Terraform or Crossplane
As a side benefit, having a generic data interface would let us welcome other infrastructure based managers like Terraform or Crossplane
I'm intrigued by this idea but at the moment, not sure of the benefit over trying to re-use existing infrastructure cluster resources with the generic extra field that prevents reconciliation as you've suggested 🤔
My understanding of the ideas in the thread so far is that, in theory, I could use terraform or something to create my infrastructure, manually populate the spec of an AWSCluster resource with the details from my terraform environment, and then add that to the cluster. At this stage, this isn't usable because the controller needs to mark the status ready, so in this scenario I want to mark the AWSCluster as a no-op/unmanaged/name TBD, and in this case, the controller looks at the pre-populated resource and says yep, ok, it has the minimum values I expect, cluster is ready (or something like this), and then does nothing else, no reconciliation of any AWS resources ever for the lifetime of this CR.
For this use case, the existing resources should have all of the fields I'd expect as an AWS user for example, to fill in already there, and they are already understood by the controllers, so it seems like it would be easy to have a no-op mode added to each of the providers.
With the generic data interface, I assume I would be using a configmap or equivalent and still be putting in the same field names to enable the interface to pick up the required values? Is the main advantage that no controller would be watching these so we don't need to define the term "unmanaged" unilaterally across all providers?
I don't think a "generic extra field" is going to work. The networking constructs are not compatible across cloud providers, and we've already hit this before through over-abstracting failure domains. I think an explicit "unmanaged" toggle for resources would work here (which also has a side benefit of aiding users who currently have to "guess" what to provision to get an unmanaged AWSCluster. Also would be useful for dealing with rate limits for edge cases (not Edge) with very large clusters / lots of clusters in an account where the rate limits just keep being hit. I still think there's a question about how to deal with this with the load balancer proposal too. I think the path forward here, as with the single controller multitenancy, is to take the Cluster API AWS v1alpha4 proposal as an instance of a infrastructure provider contract.
catching up on this, so today for the vsphere provider you should be able to create machines without needing cluster objects. Also in general for BYO Infra, there are two cases:
I think the first one is cheaper to implement and have an acceptable UX though
Hey folks! Wanted to weigh in here to give some context around how Crossplane works and how it could potentially fit into this use-case. Crossplane is similar to CAPI in that it has providers for the different clouds. However, it aims to support provisioning all managed services on every cloud provider, which is a superset of those required for CAPI. Each provider installs CRDs for all of its managed services and they map 1:1 with the cloud provider API (for instance, here are the currently supported services for provider-aws: https://doc.crds.dev/github.com/crossplane/provider-aws).
On top of these primitive resources, Crossplane provides a composition layer. This allows users to define abstract resource types (CompositeResourceDefinitions
a.k.a XRDs
) that map to 1 or more of the primitive types. For instance, a good example is creating a Cluster
XRD that maps to all the resources required to create an EKS cluster. You can also have multiple compositions for a single XRD. For instance, you may be able to satisfy the same Cluster
XRD with the resources required for a GKE cluster. This allows you have powerful abstractions, which can also be nested (i.e. I could have a ClusterGroup
XRD that composed multiple EKS and GKE cluster XRDs, etc.).
You can see the similarities to the mapping of generic resources in CAPI to their concrete implementations in CAPI providers. Supporting a common interface for the concrete implementations would allow users to author XRDs that could be backed by many different compositions that provided infrastructure for a k8s cluster. A good example of this would be that you could transparently swap out a cluster made up of EC2 instances and related services for an EKS cluster.
We see Crossplane as already doing the work on the gritty details of managing the life-cycle of the granular infrastructure resources and are working directly with many of the cloud providers to make sure those implementations are reliable and production-ready. We also see Crossplane as a strong solution to defining the abstractions that CAPI knows how to interact with. The advantage of bringing the projects closer together would include:
kubectl crossplane install configuration <oci-image-registry/repo:tag>
. (more on Crossplane packages here)Composition
and drop it in as a replacement to the "officially packaged" one. This would take minutes, and would be an alternative to forking and rewriting a CAPI provider.The integration would require significant collaboration, but I am confident that it could be done with relatively small changes to the general models used by both projects. Furthermore, I, and many other members of the Crossplane community, would be willing to invest significant effort in making this possible. I am happy to answer any questions, and / or do a more formal presentation / discussion / Q&A at CAPI community meetings.
Lastly, as many of the folks involved in this thread are key members of the CAPI and general upstream k8s community, I want to thank you for your time and effort. The impact CAPI has had and will have on k8s adoption and cluster management cannot be overstated, and the Crossplane community would love to enable that as best we can.
If we were to leverage Crossplane, then we need to come to some agreement on the multi-tenancy model. CAPA and CAPZ already have implementations near completion based on a particular RBAC model, see https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/master/docs/proposal/20200506-single-controller-multitenancy.md and https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/docs/proposals/20200720-single-controller-multitenancy.md
On top of these primitive resources, Crossplane provides a composition layer. This allows users to define abstract resource types (CompositeResourceDefinitions a.k.a XRDs) that map to 1 or more of the primitive types. For instance, a good example is creating a Cluster XRD that maps to all the resources required to create an EKS cluster. You can also have multiple compositions for a single XRD.
The XRDs and compositions do look to be quite powerful, but also quite complex. The 1 to 1 correspondence with a cloud provider API might be a benefit is some eyes, but to others, it's excessive complexity. Those APIs are not fun to deal with and the resources they describe are low-level. I think there is a lot of value in how the providers have a semi-opinionated approach to how a cluster is built on the given cloud provider. It simplifies the language a user must know to build a best practices cluster on the given cloud provider.
I think it would help folks to understand this more to see a proposal with VM based representation on a few providers, not one using a managed control plane, and how it would all tie together.
If we were to leverage Crossplane, then we need to come to some agreement on the multi-tenancy model.
@randomvariable I think this necessitates a longer conversation than we can have on this issue thread, but at quick glance, Crossplane does support the various methods of providing credentials that are supported by the AWS SDK for Go. In Crossplane, credentials are specified in a ProviderConfig
. Each object then has a reference to a ProviderConfig
, which specifies the credentials that will be used to operate on that specific instance of the resource. When creating higher level abstractions, the abstraction author can force the usage of certain credentials, or allow them to be specified at the XRD level and flow through.
I think there is a lot of value in how the providers have a semi-opinionated approach to how a cluster is built on the given cloud provider.
@devigned I completely agree. What I am proposing is that the work that is currently done to create these abstractions in the providers could actually be handled by crafting compositions of the granular resources (which is a complex exercise that should be handled by folks that have a strong understanding of the APIs) then those abstractions would be published, so users would interact with higher-level / opinionated objects, similar to those provided by the CAPI providers today.
I think it would help folks to understand this more to see a proposal with VM based representation of AWS and Azure, not one using a managed control plane, and how it would all tie together.
Absolutely. This is something we haven't shown off quite as much as the managed solutions and is obviously a critical component for how many folks are using CAPI today.
User Story
As a CAPI consumer I'd like to plug-in my own ControlPlane and Infrastructure resources [1] while still still reusing the existing machine controller implementation for infra providers.
Today this is not possible in some providers because the machine controllers are tightly coupled with the regular AWS/AzureCluster kind [2].
A scenario where this is handy is one where there's a common vision for a controlPlane across providers e.g cluster-api-provider-nested and the infrastructure management can differ arbitrarily from the core implementations, e.g BYO.
[1] https://github.com/kubernetes-sigs/cluster-api/blob/master/api/v1alpha4/cluster_types.go#L50-L58 [2] https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/master/controllers/azuremachine_controller.go#L186-L204
Detailed Description
[A clear and concise description of what you want to happen.]
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
/kind feature