kubernetes-sigs / cluster-api

Home for Cluster API, a subproject of sig-cluster-lifecycle
https://cluster-api.sigs.k8s.io
Apache License 2.0
3.52k stars 1.3k forks source link

clusterctl as a kubectl plugin #3533

Closed jackfrancis closed 3 years ago

jackfrancis commented 4 years ago

User Story

As a Kubernetes operator leveraging cluster-api to manage cluster lifecycle, I would like to be able to do cluster-api-specific operations (e.g., things that clusterctl does) using a tool I'm already familiar with (e.g., kubectl) for the purpose of simplifying my toolchain.

Detailed Description

Let's investigate the pros/cons of implemeting "clusterctl" as a kubectl plugin, rather than maintaining a discrete CLI project.

Anything else you would like to add:

This issue description will be a living document until we figure out the scope of work that a kubectl "clusterctl plugin" will entail.

/kind feature

jackfrancis commented 4 years ago

This issue may inform the considerations of clusterctl as a kubectl plugin:

https://github.com/kubernetes-sigs/cluster-api/issues/3255

detiber commented 4 years ago

/cc @Arvinderpal

wfernandes commented 4 years ago

/area clusterctl

ahmetb commented 4 years ago

Just wanted to drop a note that we’d be happy to support the distribution of the plugin from Krew (https://krew.sigs.k8s.io) side.

detiber commented 4 years ago

Biggest Pro in my mind is:

It does lead to a challenge if someone wants to install something other than the "latest", but that is already an issue that exists with the current model.

The benefits of this only increase if, as @jackfrancis pointed out, we implement #3255, since kubectl plugins/krew already exist and we wouldn't need to re-invent that wheel to implement plugins.

Using plugins would also allow for faster iteration of experimental commands/tooling for Cluster API, in that the barrier to entry for adding new command/tools would just be creating a kubectl plugin and getting it added to krew rather than having to meet the higher barriers to entry for adding functionality to clusterctl or related existing tooling.

Things will probably get a bit tricky if we start talking about interacting with different plugins as part of a workflow (for example automating provider-specific steps) as part of a 'clusterctl init` style workflow, but those are also challenges that we would have if building our own plugin system.

detiber commented 4 years ago

One potential drawback to kubectl plugins is that every command we currently have with clusterctl will be longer as a kubectl plugin.

For example, clusterctl init would be kubectl clusterctl init (though we probably want better naming for the plugin than clusterctl)

detiber commented 4 years ago

Things will probably get a bit tricky if we start talking about interacting with different plugins as part of a workflow (for example automating provider-specific steps) as part of a 'clusterctl init` style workflow, but those are also challenges that we would have if building our own plugin system.

One potential solution that we could have for this is to require plugins that are expected to interact with the main clusterctl plugin to expose one or more sub-commands that can be used for querying compatibility, feature support, configuration, etc.

jackfrancis commented 4 years ago

Is it 100% true that kubectl plugins are not able to be used immutably? I can't refer to kubectl foo @ foo:v2.1.4 ?

detiber commented 4 years ago

Based on https://github.com/kubernetes-sigs/krew/issues/343, I think that is correct.

There is a proposal for supporting custom index repositories, though. We could likely use to publish our own indices, which we could separate out by :wave: version :wave: for the purposes of compatibility.

I'm hand waving around version, since it might not be as simple as using the api version and we might want to have further discussion around how we would want to try and solve that (especially related to breaking changes in providers that may happen within the scope of a given :wave: version :wave: of cluster-api itself)

ahmetb commented 4 years ago

Krew only allows installing latest version of a plugin for simplification reasons. (It has worked well for us so far with >110 plugins). My recommendation: if you have an older version you want to keep around and rarely rev, call it clusterctl1 and if you have most of the work happening in v2, call it clusterctl. (This is also how Homebrew and some other package managers do.)

Custom indexes can be helpful here, but it requires you to name the plugins separately per version, and users can't upgrade, so you lose some of the benefit of distributing via a package manager.

jackfrancis commented 4 years ago

Thanks folks. Because (in a perfect world) clusterctl is a front-end on top of a strongly versioned cluster-api backend, there is an implicit cluster-api version associated with every clusterctl gesture/outcome. And because cluster-api does the normal thing and revs versions when it has a good reason to introduce a breaking API change, that means the version of cluster-api you're using (and by association the version of clusterctl you're using) is significant.

I think the user story here is: "As a Kubernetes cluster admin who operates lots of cluster-api-enabled clusters I need version-specific-cluster-api tooling in order to maintain that heterogeneous cluster fleet." I suspect that's a very common scenario cluster admins who rely upon cluster-api find themselves in.

devigned commented 4 years ago

For example, clusterctl init would be kubectl clusterctl init (though we probably want better naming for the plugin than clusterctl)

Perhaps, kubectl capi init and then follow for infra kubectl [capz|capa|etc] ...

wfernandes commented 4 years ago

Perhaps, kubectl capi init and then follow for infra kubectl [capz|capa|etc] ...

That's what I was thinking as well 🙂 Regardless, I think we should look at the concept of "clusterctl as a kubectl plugin" to have its own defined user experience instead of a "lift-and-shift" of commands and sub-commands from the current clusterctl.

wfernandes commented 4 years ago

Things will probably get a bit tricky if we start talking about interacting with different plugins as part of a workflow (for example automating provider-specific steps) as part of a 'clusterctl init` style workflow, but those are also challenges that we would have if building our own plugin system.

If we are thinking about sharing state, I wonder if we could leverage ideas from #3427. That is, the state being stored as part of CRDs for management cluster operator API. That is, when we kubectl capi init all the provider information would be stored as part of the operator CRD API. 🤔

vincepri commented 4 years ago

We might not even need to get kubectl capa|capz|etc in place if the management cluster operator materializes in time.

Consider the scenario that we have a unified operator and high level provider CRD that can register and manage provider components. In this scenario, users should be able to install CAPI through a kubectl plugin, and use some other way (either the same plugin, or kubectl apply) to configure and install a cluster api infrastructure provider.

In the scenario of setup jobs, to initialize an account we could use a one off job that a user can kubectl apply and a special operator shipping with the provider can reconcile once.

Some food for thought, I admit that haven't thought through all use cases here.

wfernandes commented 4 years ago

From the above discussion it seems like we were thinking of allowing clusterctl itself to support plugins as part of #3255. This leads me to thinking how this plugin inception will work. 🤔

Currently users need to gather binaries for cluster-api and provider-specific binaries from various github releases

Maybe we need more examples of other plugins that might fit the use case of kubectl capa|capz|capv... because I'm not sure what other provider specific binaries there are. Apologies for my lack of infra provider knowledge.

Some quick thoughts on possible usage of plugins:

  1. Initialize k8s cluster as a management cluster.

    # `init` is a subcommand of the `capi` plugin.
    kubectl capi init --infrastructure aws 

    Above cmd should fail if I don't have the AWS_B64ENCODED_CREDENTIALS set.

  2. # `capa` is another plugin managed by the CAPA provider authors. 
    kubectl capa encode-credentials

    This plugin could implement logic like clusterawsadm for example. This is an example of how authors could write their own plugins and develop with faster iteration. But TBH this specific example doesn't really make sense in the context of kubectl. But it's just an example. 😄

  3. # This is similar to `clusterctl config cluster my-cluster`. 
    # `generate cluster` are subcommands to `capi` with `mycluster` being the arg. 
    kubectl capi generate cluster mycluster --infrastructure aws

    I wonder if someone created a plugin named kubectl-capi-generate-cluster what would happen. As per the kubectl plugin docs, it seems that the longest plugin match would take precedence. So that might be something to look out for.

If there was a need for other provider plugins that may need information/state from the kubectl capi init then we can provide that via a Provider specific API which would also benefit the needs of a "management cluster operator".

jackfrancis commented 4 years ago

@wfernandes Here's a user scenario that's not currently covered by clusterctl, and is arguably provider specific. Something like (continuing to use the hypothetical kubectl plugin example pattern):

# `capz` is the plugin managed by the CAPZ provider authors
$ kubectl capz create

A gesture like that could create an Azure-backed cluster-api-enabled Kubernetes cluster from scratch, using Azure-appropriate defaults (for example, let's say that workflow creates an ephemeral AKS cluster to instantiate a the cluster-api mgmt componentry, then creates a target cluster backed by Azure IaaS, then does the equivalent clusterctl move functionality [and finally cleans up the ephemeral AKS cluster] so that the end result is a "self-managed" cluster-api-enabled Kubernetes cluster running in Azure.

It's possible to do this via the introduction of a standard, generic clusterctl create gesture, with --infrastructure <provider> type context, but (again, arguably), the actual create workflows are going to vary by cloud provider, or perhaps differently stated, by the UX considerations of any given user-serving solution (i.e., products that are implemented as providers).

Basically, what I'm suggesting is that "promoting" clusterctl --infrastructure azure (or kubectl capi --infrastructure azure using that example) to a 1st class plugin verb like clusterctl capz (or kubectl capz) is preferable for certain CLI workflows where the provider is more opinionated about how a given workflow should serve its particular provider userbase. Basically, trying to define "generic" definitions for solutions that are ultimately "specific" is always a hard problem.

Hope that makes sense. Strictly speaking, I don't think there's any real distinction between implementing "clusterctl plugins" vs implementing "cluster-api-relevant kubectl plugins". Both solutions are composable enough to handle the various hypotheticals folks have thrown out in this thread. The primary advantage (IMO) to the kubectl plugin approach is:

1) The kubectl plugin ecosystem already exists (we'd be building a clusterctl plugin ecosystem from scratch) 2) cluster-api CLI gestures become (arguably) more natural extensions of a user's existing toolchains, because those users are already using kubectl 3) It eliminates the "switch back and forth between clusterctl and kubectl tool depending upon what I'm doing" UX friction that currently exists for users creating and maintaining Kubernetes clusters in the cluster-api paradigm.

Hope that makes sense!

wfernandes commented 4 years ago

@jackfrancis Thanks for the explanation. This makes sense.

I think the important thing to note here would be:

a 1st class plugin verb like clusterctl capz (or kubectl capz) is preferable for certain CLI workflows where the provider is more opinionated about how a given workflow should serve its particular provider userbase.

Creating a separate kubectl plugin for well-established providers makes sense. But I'm assuming for newer providers we either need to:

Otherwise the usability of the provider ecosystem may suffer.

jackfrancis commented 4 years ago

@wfernandes +1

  1. Definitely the current, "vanilla" clusterctl solution is super important to maintain so that there's an easy onramp for new provider authors:
    • generate cluster config
    • init an existing k8s cluster
    • move/pivot

Basically, the existing functionality makes sense (IMO) as a functional minimal bar to clear.

  1. We would want to encourage providers who want to do the full custom kubectl/clusterctl plugin approach to "retire" their presence in the vanilla "clusterctl --infrastructure" usage vector, otherwise we just confuse users.

So yes, I think the key point is that we definitely want this plugin model to be opt-in, and for providers to expose one or the other (obviously when the new, plugin thing is new for a provider they can prototype it in an alpha phase for a bit until it's stable). I can sort of imagine the plugin matriculation working in both directions:

jackfrancis commented 4 years ago

This thread is getting a bit chatty at this point, so I'll draft out what we've discussed so far in a more formal doc, with some progress towards concrete goals/solutions.

@wfernandes would you like to co-author? In any event, I'll bug you to clarify goals/non-goals before writing anything more substantial.

vincepri commented 4 years ago

/kind design /milestone Next

wfernandes commented 4 years ago

@wfernandes would you like to co-author? In any event, I'll bug you to clarify goals/non-goals before writing anything more substantial.

@jackfrancis Sure. I'd be happy to help wherever I can 🙂

devigned commented 4 years ago

@jackfrancis please share a link to the doc when it's ready for some eyes.

jackfrancis commented 3 years ago

Here's a doc outlining the problem statements as I see them (and what I am motivated to solve):

https://docs.google.com/document/d/1FCX_u2mnxPgLYBXEPaS8Eni7423AmpkcZfiTkEZJUao

As a result of formulating those problem statements, and some time/thought, my conclusions are that we're not ready to do this work, and should instead focus on stabilizing the set of clusterctl functionality. Open to thoughts from others!

wfernandes commented 3 years ago

@jackfrancis I requested access to view the doc so I wasn't able to review it yet. However, I think this work may be enabled in the future by the updates to the API that will be done as part of the Management Cluster Operator initiative.

If it makes sense we can either close this issue and reopen/recreate it when the need arises.

jackfrancis commented 3 years ago

+1 on closing for now. Thanks all for your valuable contributions to the historical record! :)