christianh814 commented 2 years ago

Repo Structure Best Practices Proposal

I decided to throw my hat in the ring and start the conversation around getting best practices around repo structures.

I am starting with the "One repo for one cluster" design, as it's the easiest. I tried to go with a design that I've seen a lot of with my dealings out with end users while keeping it GitOps controller agnostic (i.e. it should work with Argo CD, Flux, Open Cluster Managment, etc).

This PR also comes with some examples as well.

For background/more context; I went over this on my stream HERE and HERE

Structure

Below is an explanation on how this repo is laid out. You'll notice Kustomize is used heavily. This is to follow the DRY principal when it comes to YAML files.

cluster-XXXX/ #1
├── bootstrap #2
│   ├── base
│   └── overlays
│       └── default
├── components #3
│   ├── (applicationsets OR kustomizations)
│   └── (argocdproj OR gitrepositories)
├── core #4
│   ├── gitops-controller
│   └── sample-admin-workload
└── tenants
    ├── bgd-blue
    └── myapp

#	Directory Name	Description
1.	`cluster-XXXX`	This is the cluster name and top level directory in your repo. This name should be unique to the specific cluster you're targeting. If you're using CAPI, this should be the name of your cluster, the output of `kubectl get cluster`
2.	`bootstrap`	This is where bootstrapping specifc configurations are stored. These are items that get the cluster/automation started. They are usually install manifests of the GitOps controller. `base` is where are the "common" YAML would live and `overlays` are configurations specific to the cluster. The `kustomization.yaml` file in `default` has all the directories under `cluster-XXXX/components/` as a part of it's `bases` config.
3.	`components`	This is where specific components for the GitOps Controller lives. The directories here depend on the GitOps controller you are using. For example, if you're using Argo CD, `applicationsets` and `argocdproj` are possible directories (each in their repsective configurations). For Flux, this is where `gitrepositories` and `kustomizations` (with their repsective configurations) would be stored.
4.	`core`	This is where YAML for the core functionality of the cluster live. These are things that are necissary for the cluster to run/be usable. Here is where the Kubernetes administrator will put things that is necissary for the functionality of the cluster (like cluster configs or cluster workloads). Under `gitops-controller` is where you are using the GitOps controller to manage itself. The `kustomization.yaml` file uses `cluster-XXXX/bootstrap/overlays/default` in it's `bases` configuration. This `core` directory gets deployed as a workload. To add a new "core functionality" workoad, one needs to add a directory with some yaml in the `core` directory. See the `sample-admin-config` directory as an example.
5.	`tenants`	This is where the workloads for this cluster live. Similar to `core`, all directories under `tenants` gets deployed . This is where Devlopers/Release Engineers do the work. They just need to commit a directory with some YAML and the GitOps controller takes care of creating the workload. Note that `bgd-blue/kustomization.yaml` file points to another Git repo. This is to show that you can host your YAML in one repo, or many repos, while still keeping one repo to manage the one cluster.

Noteworthy Things

Some of this layout may seem "overkill" for a single cluster, but there are reasons.

The bootstrap seems redundant as you are just referencing it in the gitops-controller directory. This is to "instill good habits" as the bootstrap directory comes in handly later when you're doing mono-repo (1 repo to many clusters).
It may seem redundant to have core and tenants. You can fold this into one directory called apps because, functionally, core and tenants are the same thing. However, this is done (separating the two) as a way to set a demarcation between what "is needed in the cluster" and "what is using the cluster".
There is an overlays/default directory for the same reason as there is bootstrap directory. Instilling good habits. This is, technically, unecissary for the functionality; but it's an important practice for other designs/implementations.

Examples

I've added examples for Argo CD, Flux, and OpenShift GitOps. These examples won't work until they are merged. They are based on examples I have in my repo. So if you want to "fork and hack", I'd suggest you look at these

Argo CD Flux OpenShift

moshloop commented 2 years ago

Just a couple of comments:

1) The components/CRD/ structure is a code smell for me and is akin to using a variable name like iNumber, nameString in programming languages or nginx-deployment as a deployment name, 2) I think the hero of the story here is not the gitops infrastructure, but rather the underlying principles of consistency and readability. Because of this, groupings should be based on applications or concerns, not underlying types. 3) Tenants are an overloaded term (is this a billing/security/team/application boundary) - I think specifying the actual boundary type as the folder name will be clearer, as well as enabling a hierarchy 4) Single clusters, and isolated clusters that essentially have changes copied and pasted are a bad practice (A promotion from 1 cluster to another should either be a merge (i.e. branches or forked repos) or marker change in a mono-repo - I don't think we should encourage these practises and only publish designs that are multi-cluster/promotion ready from the get-go. 5) Its not clear how mixins and re-use would be achieved use this design

A structure I have used with much success

├── clusters
│   ├── dev-cluster
│   │   ├── applications
│   │   └── infrastructure
│   └── prod-cluster
└── mixins
    ├── all
    ├── applications
    │   ├── flux
    │   ├── monitoring
    │   └── nginx
    ├── cloud
    │   ├── aws
    │   └── gcp
    ├── environments
    │   ├── dev
    │   ├── prod
    │   ├── sandbox
    │   └── test
    └── regions
        ├── eu
        └── us

Mixins are abstract components or behaviors that can be imported into specific clusters, they can also inherit from each other i.e. all dev environments include aws and flux, etc..
Clusters are fairly lightweight primarily just importing mixins, bootstrapping the workloads, and specifying cluster/workload-specific secrets and configuration.
Versioning and promotion is straightforward for a mono-repo, branches/forks, and a tagged and released mixins repo.
Directory-based approval policies would enable segregation by role/env/cluster etc..

christianh814 commented 2 years ago

I need time to digest the rest of this but just some quick thoughts...

I think the hero of the story here is not the gitops infrastructure, but rather the underlying principles of consistency and readability. Because of this, groupings should be based on applications or concerns, not underlying types.

This is the reason why I tried to conform to Kustomize because it's 1) Builtin to Kubernetes and 2) most (if not all) gitops controllers support it. I hope the agnostic nature came across in this PR. The decision to do certain things/certain layouts was due to the fact of trying to stay agnostic.

Single clusters, and isolated clusters that essentially have changes copied and pasted are a bad practice (A promotion from 1 cluster to another should either be a merge (i.e. branches or forked repos) or marker change in a mono-repo - I don't think we should encourage these practices and only publish designs that are multi-cluster/promotion ready from the get-go.

I disagree that this is a bad practice. People may not like it, it may not work for everyone, but I disagree with the notion that it's a bad practice.

I, personally, am not a fan of a mono-repo. But it's still a valid design and we should have examples for those (in another PR).

Tenants are an overloaded term (is this a billing/security/team/application boundary) - I think specifying the actual boundary type as the folder name will be clearer, as well as enabling a hierarchy

Yeah, I am also on the fence about "tenants". It actually can be named something else. I thought of "apps" or "workloads" as well. I used "Tenants" because it can be an application, or a whole team. But yeah, I'm not 100% a fan of the word.

I'm also not fan of the word "core" either, but it's a placeholder.

Its not clear how mixins and re-use would be achieved use this design

Mixins wouldn't, at least not on initial thought, fit in this design. I would think Mixins would be something for a mono-repo design or a poly-repo design (one-to-many or many-to-none)

chris-short commented 2 years ago

Thank you for putting this together, @christianh814. The one thing I'll note that there will always be some kind of boundaries that exist at the organizations level that will need to be adhered to. Be it how the cluster admins mix or don't mix into the development teams, security folks, or some other regulatory requirement. These boundaries will often define the use of a mix of repositories from different stakeholders.

Staying agnostic here is much appreciated. But, there is no one size fits all situation because of organizational boundaries being different. Someone's repo structure might end up looking a lot like their org chart. There will inevitably be a lot of similarities based on the regulatory patterns organizations embracing GitOps have to adhere to. It might turn out that we make suggestions on a per industry/regional basis. Regardless, the more use cases, best practices, and visual examples we can provide the better, in my opinion.

tenants and core

I think we can noodle around the names in a work session or meeting. But, getting universal terms that have meaning in a GitOps workflow is vital for this group. Whatever we come up with here could be seen in repos ten or twenty years from now. We don't know. But, we definitely should discuss what to call these areas/directories. It may very well end up being one of many terms used at each point based on the tool selection too. It's best we think globally about these words to make sure we capture what our thinking is. I have a feeling it'll take less time than it took me to write this paragraph to land on good terms to use.

"Bad Practice"

I feel like there's some nuances there @moshloop that we should discuss. There are some fledgling practices out there but there's nothing concrete. I don't think we can call anything bad or an anti-pattern unless it violates a Principle in these initial discussions. Remember the break glass scenarios too. The copy/pasta thing will inevitably happen whether we like it or not (for any number of reasons). There are WAY too many isolated (patented intellectual properties/governments), disconnected (restaurants/soda fountains), or infrequently connected (cars in a garage/airplanes) environments out there. Some of us remember a point in time when we all had a thumb drive (or CD-ROM if you're old enough) with essential tools on it that we'd pull out when the stuff hit the fan. We didn't like it but, we did it out of necessity, and we had to do it frequently enough to create the mechanism to do so (the USB drive). With that being said, I'd like to discuss the point further in the next meeting. I might be misunderstanding your point.

moshloop commented 2 years ago

To clarify on multiple clusters, I am not referring to multiple production clusters, but rather at least 1 non-prod cluster which is a regulatory requirement in many industries and a requirement in almost all other enterprises. Having these clusters as a single standalone repository goes against the principle of versioning (You can have separate repositories, but they should have a common git root so that you can promote, rebase or cherry-pick changes between clusters without losing the history)

chris-short commented 2 years ago

But, what if there's a regulatory/policy requirement that those codebases be independent between prod and non-prod? I've seen this use case in data center providers. Prod being isolated from dev/stage with legit firewalling preventing promotion from one env to another via git. I guess it really depends where the git repos live at that point. In my case mentioned here, there was copious copying and pasting prior to Ansible being brought in to manage the environments more consistently. But even then, Ansible and its repos live in dev, stage, and prod independently with no way to communicate between environments aside from a maintainers laptop.

moshloop commented 2 years ago

Even if the git servers are physically isolated (or offline) it doesn't prevent them from having a common git root - the common root makes it easier to address other governance requirements that often go hand in hand with isolation such as requiring anything going into prod to be verified and signoff as working in a non-prod environment first.

Another common approach is to extract the mixins into a standalone repository that undergoes a full SDLC with a versioned artifact produced that is promoted between environments - In this scenario, the environment repos don't have a common git root, but a common dependency between them and consist primarily of imports from these released repositories.

christianh814 commented 2 years ago

After some conversations and meetings, I think we should focus more on validated patterns and best practices. There is too many variable with respect to directory structures. It's def Conway's law and directory structures will vary, even when following best practices (example: Don't repeat yaml so use Helm and Kustomize, Use protected branches, use folders for environments and not branches, etc)

IMHO, if someone says "I want a directory structure template I can use" then it's indicative that they don't understand the problem.

Again, I believe we should focus on best practices documents and validated patterns rather than try to create a template out of something that, IMHO, can't be. 🪙🪙

I say we close this. Since it's my PR I'll leave it open for a few days but if I don't hear a complaint I'll close it.

chris-short commented 2 years ago

I like the idea of a bullet list for patterns. That way things are nicely reviewable and we can dive deeper into the whys in a more thorough document.

christianh814 commented 2 years ago

After some conversations and meetings, I think we should focus more on validated patterns and best practices. There is too many variable with respect to directory structures. It's def Conway's law and directory structures will vary, even when following best practices (example: Don't repeat yaml so use Helm and Kustomize, Use protected branches, use folders for environments and not branches, etc)

IMHO, if someone says "I want a directory structure template I can use" then it's indicative that they don't understand the problem.

Again, I believe we should focus on best practices documents and validated patterns rather than try to create a template out of something that, IMHO, can't be. coincoin

I say we close this. Since it's my PR I'll leave it open for a few days but if I don't hear a complaint I'll close it.

Closing based on my previous message ^

Feel free to repoen if you disagree.

open-gitops / documents

1:1 Repo structure Proposal #51

Repo Structure Best Practices Proposal

Structure

Noteworthy Things

Examples