Move CAPD + CABPK into this repository

chuckha commented 5 years ago

/kind feature

Describe the solution you'd like I'd like to merge CABPK and CAPD into this repository.

This is quite a large request/update with many areas to think about. I've included my preliminary thoughts below. I could convert this to a CAEP if desired.

Release

This change could drastically change the concept of a cluster api release if we wanted.

One option

We could consider a single release of cluster-api that encompasses 3 components, core, infra, and bootstrap providers. This allows to have a single release of Cluster API that has 3 default components that are in a known working state together. This alleviates the versioning issue with at least the default configuration of Cluster API because we are able to make a new version that wraps the sub versions (v1 of Cluster API contains v0.4 of CAPI, v0.2 of CAPD and v0.1 of CABPK, for example).

This kind of release would not prevent us from also releasing each default component as their own artifacts much like we do today, but would allow us to more easily pull all necessary components together for a single consumable entity for users to get started without worrying about each individual component.

However this would get tricky with regard to Go tagging/versions. More on that below.

Versions

Versioning would remain mostly the same. Each component gets a version and a set of release artifacts. However, I would propose renaming the Cluster API controller the Core controller and using the term Cluster API to mean a fully functional Cluster API which includes at least one core provider, one bootstrap provider and one infrastructure provider.

VCS (git) Tagging

Tagging will be a bit complex. I'd recommend we base it on Cluster API, the top level thing. This is where the shared code would live. Each provider (core, bootstrap, infra) would not be built for importing. If you want to import cluster-api that means you're importing the types or the shared library code. A directory layout like this would make this possible:

cluster-api/
  core/
  infrastructure/
  bootstrap/
  shared/

or even

cluster-api/
  internal/
    core/
    infrastructure/
    bootstrap/
  shared/ (use a better name, we can work on this later)

User Experience

The user experience is improved massively here. For beginner users, they would have to go to CAPA and use the provider-components manifest. That is their only choice. They can't experiment outside of a real cloud to really understand the product without becoming an advanced user.

For the beginner use case we would be able to more easily provide a provider-components file that does not require going out to specific versions of other repos. They all live in tree and are therefore easy to combine and include in a release.

For the advanced use case they can grab the getting started guide and also install their own infrastructure provider. They'd have the docker provider and their custom infrastructure provider but it would be fast to get going.

If they are very advanced they could do what they would have to do today and get each individual component they want and install them manually (or use clusteradm).

Documentation

Documentation duplication goes away. Now we have a way to install/dev/play with Cluster API. The documentation can also expand to show swapping out an individual provider. But there won't be anymore duplication of effort across repositories. CABPK has a developer getting started guide in the README. Core has a quickstart, Docker points to CABPK. Frankly, it's a mess and would be great to come to Cluster API and see a fast way to get started and a way to move on to the production use case.

Go dependencies

I'd recommend (at least) three go submodules. I'd also recommend (but not required at all since this is contentious), we use major versions as go recommends and keeping a v1,v2, etc modules in each sub project. This is to simplify maintaining previous versions without having to deal with different release branches because this gets very complex very quickly. The API guarantees would also be nice for consumers, but consumers should only be importing types which already have a unique version name (v1alpha1, v1alpha2, ...)

/cc @timothysc

vincepri commented 5 years ago

I gave a quick read-through and I like what I'm reading, but quick feedback and I'll write up a longer one later:

Focus this issue to only bootstrap providers, infrastructure providers are a larger change and it should be discussed with the community at the face to face next week.
Let's reduce the amount of changes by using internal/providers/bootstrap and place kubeadm in there instead of changing the entire repository.
+1 on the UX and documentation improvements/
Let's hold off on adopting /v1, /v2 in Go modules until we have a complete story around it, as a user and a developer I don't know what that means and how different versions relate to each other.

detiber commented 5 years ago

My biggest worries here are that by hosting providers in repo we:

turn external providers into second class citizens that cannot implement the same feature sets as internal providers
we do not maintain stability of the contracts we define for each of the providers, since ensuring that we do not introduce changes in a patch version that require behavioral changes between different providers will be much easier and this could cause integration and weird version compat issues for externally hosted providers.

timothysc commented 5 years ago

turn external providers into second class citizens that cannot implement the same feature sets as internal providers

I think this falls into the batteries included but swappable, I honestly don't see this as a problem so long as we full document the details. Also CAPD & CABK should be considered the reference implementations that exist in the CAPI toolkit. Without those direct references, we end up confusing the community of developers and users.

vincepri commented 5 years ago

I'd like to do this in 2 stages, bootstrap providers (there is only one today that's public) are easier to manage and we can optimize (and fast track) kubeadm.

chuckha commented 5 years ago

Moving this discussion to a CAEP (does not exist yet) which will more formally outline the problems before writing down any potential solutions.

Will update/open a new issue when a document is ready.

/close

k8s-ci-robot commented 5 years ago

@chuckha: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1416#issuecomment-530942012): >Moving this discussion to a CAEP (does not exist yet) which will more formally outline the problems before writing down any potential solutions. > >Will update/open a new issue when a document is ready. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

timothysc commented 5 years ago

@chuckha - fwiw I think collecting comments on the issue would help feed your CAEP.

vincepri commented 5 years ago

@chuckha Do you want to work together later today on action items that should happen for this issue?

vincepri commented 5 years ago

Checklist of work items needed to make this happen:

[x] Branch off release-0.2
[x] Prow CI job after PR merges should build release-0.2 branch (publish v0.2)
[ ] Open a PR to move CABPK -> /bootstrap/kubeadm (submodule)
[ ] Open a PR to move CAPD -> /test/infrastructure/docker (submodule)
[ ] Separate build pipelines / container images
[ ] Tests for sub-modules should only run for PR containing changes in submodules

detiber commented 5 years ago

The repo imports should be done in a way to preserve history and attribution.

chuckha commented 5 years ago

The repo imports should be done in a way to preserve history and attribution.

without a doubt.

detiber commented 5 years ago

Build/release tooling will need to be updated as well.

Do we continue to publish the images to the existing staging buckets, or do we consolidate them to a single bucket?

Docs will need to be updated to avoid linking out to external docs (CABPK in particular)

detiber commented 5 years ago

Release process will need to be updated as well, do we always build/release updated images for all images for each release? Do we version and release images independently?

chuckha commented 5 years ago

One of my main concerns here was release/tagging. I think the release work can be figured out as the projects move forward. As for tagging strategy, go modules has us sorted.

In a multi-module repository, please see https://research.swtch.com/vgo-module#multiple-module_repositories for an in-depth discussion.

In short, git tags would now look like

v0.1 => cluster-api v0.1 bootstrap/kubeadm/v0.2 => kubeadm v0.2

as we move into major versions we're still supported with

v1.0.0 => cluster-api/ v2.0.0 => cluster-api/v2 bootstrap/kubeadm/v1.0.0 => cluster-api/bootstrap/kubeadm bootstrap/kubeadm/v2.0.0 => cluster-api/bootstrap/kubeadm/v2

vincepri commented 5 years ago

The repo imports should be done in a way to preserve history and attribution.

+1, we need to come up with a plan that doesn't break the current CAPI history.

I looked through the commits in both repositories, and I think we should squash a lot of commit before merging them here, especially in the docker provider which doesn't seem to follow CAPI best practices around commit messages and squashing in logical groups.

Do we continue to publish the images to the existing staging buckets, or do we consolidate them to a single bucket?

Everything should be kept separate IMO, high level automation tooling can be done in a future iteration to simplify the releases.

Release process will need to be updated as well, do we always build/release updated images for all images for each release? Do we version and release images independently?

Yes, at least for the first iteration.

Docs will need to be updated to avoid linking out to external docs (CABPK in particular)

That's fine, but we shouldn't refer to v1alpha3 until we release

detiber commented 5 years ago

That's fine, but we shouldn't refer to v1alpha3 until we release

Do we leave the old repositories around to not break v1alpha2 related docs and usage? If so at what point do we finally pull the plug?

Ideally I would like to document the state of v1alpha3 as we go rather than deferring the documentation to the end. We already have the ability to do per-branch documentation so we should take advantage of that to ensure that we don't repeat past mistakes wrt documentation

vincepri commented 5 years ago

I think we should keep the repositories around, for sure. At least until we drop support for v1alpha2.

v1alpha3+ releases for both of these providers / components will be done in this repository instead.

I agree on documentation. I'm not sure if netlify allows this, but I'd love to keep the primary URL pointed to the release-0.2 branch going forward and start documenting v1alpha3 on master

chuckha commented 5 years ago

Playing around with merging strategies. Due to github not following file renames there is very little we can do with regard to github in-browser file history. However, all history is maintained and no commits are lost.

I believe k8s will still count the commits but I'll followup with .... someone upstream to ensure this is correct.

For now, please enjoy: https://github.com/chuckha/merged-capi-cabpk

vincepri commented 5 years ago

Nice. I guess we should make sure we're not duplicating contributions (and stats) across two repos?

chuckha commented 5 years ago

@vincepri https://kubernetes.slack.com/archives/C9A0P0GPK/p1569443307015000

tl;dr it's no problem.

vincepri commented 5 years ago

The repo you linked looks good imo, what should we do w.r.t. the docker provider instead?

vincepri commented 5 years ago

Let's rally around actions items, I added a PSA that we'll start working on it in the community meeting, if there is no push back or more feedback, let's make this happen 🎉

chuckha commented 5 years ago

Let's talk about the docker provider commit history -- now, it doesn't have the cleanest commit history (as mentioned above) but at this point it feels to me it would be unfair to remove attribution in a merge.

I wonder what people think of reducing this down to a single commit and keeping the attribution in separate repo or if merging (warts and all) is the right path forward. @detiber do you have thoughts around this specific situation? I see you have expressed thoughts above about this particular topic.

detiber commented 5 years ago

@chuckha Personally, I would rather ensure proper retention of attribution than a clean commit history

ncdc commented 5 years ago

@detiber are you interested in per-line blame attribution, or is GitHub's co-authored-by information acceptable in a single squashed commit? (I have no strong feelings - just looking for more details.)

detiber commented 5 years ago

@ncdc both git blame/praise attribution, but also devstats attribution.

timothysc commented 5 years ago

On repo translation git details will be lost, and only OWNERS subsections will be maintained. Devstats should be undeterred. Also, none of this can/should be a blocking concern to moving forwards, as this happens all the time on code migrations.

vincepri commented 5 years ago

+1 to what Tim said and props to Chuck for opening the different PRs to get this merged.

Can we confirm with devstats if keeping the repositories in archived mode keeps the contribution counted? If so, seems like a good compromise to merge in a single commit, avoid roadblocks, and fast track the work.

chuckha commented 5 years ago

I have a few PRs open:

A single merge commit for CABPK: https://github.com/kubernetes-sigs/cluster-api/pull/1483 A full history merge for CABPK: https://github.com/kubernetes-sigs/cluster-api/pull/1482 A full history merge for CAPD: https://github.com/kubernetes-sigs/cluster-api/pull/1481

The first two are for comparison purposes. I like the single commit a bit more to be honest.

vincepri commented 5 years ago

I agree with that sentiment, let's just make sure contributions aren't erased from archived repositories, and I think that'd be good to go.

chuckha commented 4 years ago

This is done

/close

k8s-ci-robot commented 4 years ago

@chuckha: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/cluster-api/issues/1416#issuecomment-545131450): >This is done > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / cluster-api