Define strategy for migration from OLMv0 to OLMv1

joelanford commented 1 year ago

As of #85, we have begun to add support for the OLMv1 APIs into this plugin via a new subcommand kubectl operator olmv1. So the current state of the world is:

OLMv1 stuff at kubectl operator olmv1
OLMv0 stuff at kubectl operator * (except olmv1)

Long term, I think the goal is:

OLMv1 stuff at kubectl operator
OLMv0 stuff no longer present.

How do we get from A to B?

Proposal 1

In #84, I proposed the use of an environment variable (I chose EXPERIMENTAL_USE_OLMV1_APIS, but we could pick something else), which would act as a switch controlling the entirety of the kubectl operator functionality:

EXPERIMENTAL_USE_OLMV1_APIS=on: kubectl operator is OLMv1
Anything else: kubectl operator is OLMv0

My thought there was that we could perform the migration like this:

T1. OLMv0 support only
T2. Default: OLMv0, require EXPERIMENTAL_USE_OLMV1_APIS=on for OLMv1
T3. Default: OLMv1, require EXPERIMENTAL_USE_OLMV1_APIS=off for OLMv0
T4. OLMv1 support only

In this proposed model, OLMv1 users can essentially opt-in early (at T2) and then not have to worry about subcommand changes later. Also in this model, OLMv0 users can explicitly opt-out early to avoid a breakage at T3. This gives both sets of users more time to adjust to incoming changes.

Proposal 2

Another option is to perform this same transition, but using subcommand renaming:

T1. OLMv0 support only at kubectl operator
T2. OLMv0 support at kubectl operator, OLMv1 support at kubectl operator olmv1
T3. OLMv1 support at kubectl operator, OLMv0 support at kubectl operator olmv0
T4. OLMv1 support only at kubectl operator

Proposal 3

Another option is a variation on Proposal 2, except we skip T3 and do a hard cutover:

T1. OLMv0 support only at kubectl operator
T2. OLMv0 support at kubectl operator, OLMv1 support at kubectl operator olmv1
T3. OLMv1 support only at kubectl operator

My two cents

I had always imagined this working like proposal 1, which is similar to the way that Go modules were introduced in Go. IMO, the envvar approach keeps the CLI clean, makes for less typing for OLMv1 users, and will make the transition easy to implement.

In proposal 2 and 3, we'll definitely break early users of the OLMv1 functionality at some point when we remove the olmv1 subcommand and move that functionality under the main subcommand. And I feel like it would be nice if we could help them avoid that.

This plugin is still v0 (and may always be v0), so any of these proposals is fine from a semver standpoint. I'm mainly interested in choosing a strategy that primarily minimizes user impact and secondarily minimizes developer impact.

tmshort commented 1 year ago

I view environment variables as somewhat dangerous, because the exact same command may have different and unexpected behavior. This makes it harder to debug when issues arise. Having the commands explicit on the command line (even though I know environment variables can be set on the same command, they can also be set in the environment, and not part of the command).

EXPERIMENTAL_USE_OLMV1_APIS=on is longer than olmv1.

If you put the variable in your environment, everything works with OLMv1, until you move to another shell without that variable set.

Using key words (proposals 2 and 3), at least lets you know exactly what command is being run, without having to know the environment.

everettraven commented 1 year ago

Would a flag be a potential compromise between the different proposals? You could introduce a boolean flag like kubectl operator --olmv1={true|false}

The migration could look like:

T1: OLMv0 support only at kubectl operator
T2: OLMv0 support at kubectl operator, OLMv1 support at kubectl operator --olmv1 (i.e --olmv1=false by default)
T3: OLMv1 support at kubectl operator, OLMv0 support at kubectl operator --olmv1=false (i.e --olmv1=true by default) & flag is deprecated with a message saying OLMv0 support is going away soon
T4: OLMv1 support only at kubectl operator, --olmv1 flag removed

tmshort commented 1 year ago

@everettraven I had originally thought of flags (e.g. --format=olmv1) but it makes the help a bit convoluted, since only certain commands (e.g. install and uninstall) would support them. But it certainly is a possibility.

A possible knock against environment variables: command completion.

joelanford commented 1 year ago

All good points!

A possible knock against environment variables: command completion.

You can still get OLMv1 command completion (you just have to set the envvar when the completion is generated). So yeah, it is an extra hoop to jump through, and maybe can't easily handle dynamic completion for OLMv0 and OLMv1 simultaneously.

But I kinda imagine that there won't be too many people that care about both modes at the same time.

joelanford commented 1 year ago

And @everettraven re the flag idea: +1 to what @tmshort said about that making the help pretty wonky. There are a bunch of flags in the OLMv0 version of the commands that won't make sense in the OLMv1 version of the commands. So I think it's important that whatever we decide keeps those separate.

I think @tmshort makes some strong points in regard to the concerns about how users may not be aware of the environment or that it might change depending on where/how they run the command.

Right now, I think we are kind of assuming that you have either OLMv0 OR OLMv1, but not both, installed on a cluster. Under that assumption, we could do some sanity checks in a PreRun to make sure the cluster actually has the APIs that are about to be used, and if they don't then print out a message that hints at using the other mode. That might be something we want to do regardless of the approach we take though.

On the other hand, when we get to the point of figuring out the larger OLMv0 to OLMv1 migration, it may be an invalid assumption that a cluster has exclusively OLMv0 or OLMv1.

dmesser commented 1 year ago

I feel proposal 1 is the cleanest approach as it has good CLI UX and clear expectation setting. Trying to manage v1 and v0 based operator deployments with the plugin is less likely to be something that we'll see demand for. And even in this case, switching the behavior temporarily feels like a good and conscious tradeoff to me compared to trying to auto-determine what APIs were used on the cluster to carry out the operations previously or pollute the CLI switch structure.

tmshort commented 1 year ago

I am going to suggest another option. Use completely different keywords for OLMv1 vs. OLMv0. So while we have for OLMv0: kubectl operator install and kubectl operator uninstall. So, perhaps OLMv1 has: kubectl operator deploy and kubectl operator undeploy or something along those lines that is different from install and uninstall.

This eliminates the need to do any migration.

tmshort commented 1 year ago

Or even create and delete, which are kubectl keywords.

joelanford commented 1 year ago

create/delete is intriguing. I think my only qualm is that it seems like it would mean all the following would all be present in the help at the same time:

operator create
operator delete
operator install
operator uninstall

Which would be pretty confusing in the interim when both sets of commands exist. But perhaps this new cobra "command groups" feature could help with that: https://github.com/spf13/cobra/pull/1003

We'll have to think about what we'd do with the OLMv0 upgrade subcommand as well.

operator-framework / kubectl-operator