Add description of how to manage only Kubespray custom configuration in a separate git repository

MurzNN commented 11 months ago

What would you like to be added: Please add an official "best practices" instruction on how to manage the Kubespray configuration for a custom cluster in a separate git repository, with the minimum amount of files.

Why is this needed: Now a lot of instructions and articles about Kubespray recommend copying the whole directory inventory/sample to inventory/myclustername and doing the modifications of the needed options directly in these files.

Yes, this works well!

But, as a result, most of the users just adds this inventory/mycluster directory to the git, and upload all Kubespray folder to the git, producing a whole copy of Kubespray repo with a lot of files.

Even if some users are smart and store in a custom git repository files only from the inventory/mycluster directory, there are still a lot of files! Look:

./group_vars
./group_vars/k8s_cluster
./group_vars/k8s_cluster/k8s-net-macvlan.yml
./group_vars/k8s_cluster/k8s-net-kube-router.yml
./group_vars/k8s_cluster/k8s-cluster.yml
./group_vars/k8s_cluster/addons.yml
./group_vars/k8s_cluster/k8s-net-weave.yml
./group_vars/k8s_cluster/k8s-net-kube-ovn.yml
./group_vars/k8s_cluster/k8s-net-flannel.yml
./group_vars/k8s_cluster/k8s-net-custom-cni.yml
./group_vars/k8s_cluster/k8s-net-calico.yml
./group_vars/k8s_cluster/k8s-net-cilium.yml
./group_vars/etcd.yml
./group_vars/all
./group_vars/all/gcp.yml
./group_vars/all/openstack.yml
./group_vars/all/containerd.yml
./group_vars/all/azure.yml
./group_vars/all/vsphere.yml
./group_vars/all/oci.yml
./group_vars/all/upcloud.yml
./group_vars/all/aws.yml
./group_vars/all/offline.yml
./group_vars/all/cri-o.yml
./group_vars/all/etcd.yml
./group_vars/all/hcloud.yml
./group_vars/all/huaweicloud.yml
./group_vars/all/all.yml
./group_vars/all/coreos.yml
./group_vars/all/docker.yml
./patches
./patches/kube-scheduler+merge.yaml
./patches/kube-controller-manager+merge.yaml
./inventory.ini

Usually, users change only a couple of lines in the group_vars/k8s_cluster/k8s-cluster.yml from the default values, like:

cluster_name: mycluster.mydomain.com
supplementary_addresses_in_ssl_keys:
 - node1.mycluster.mydomain.com
 - node2.mycluster.mydomain.com

and in the group_vars/k8s_cluster/addons.yml:

helm_enabled: true
metrics_server_enabled: true

So, they can simply create a single file like mycluster-defaults-overrides.yml and just put these couple of changes here, and store only a single file with 6 lines in their custom git repo, and do not upload a hundredth copy of the whole "sample" directory to github.com, wasting the cloud storage space for nothing.

Moreover, a benefit of this approach will be a quick finding out what exact parameters were changed from defaults for this cluster, instead of digging into dozens of files to search which exact couple of lines were changed from defaults.

I created an example repository with a minimal template of the Kubespray configuration here: https://github.com/Murz-K8s/kubespray-inventory-base Feel free to discuss and improve it.

VannTen commented 11 months ago

I think we should just remove the samples inventories (or at least the variables) and rely on roles defaults (also for documentation).

For some data point, when I came to work on the clusters I manage with kubespray, I removed more than 10k lines of inventory, accross 8 clusters. Most of it was just default value, which made quite hard to tell what was not default.

MurzNN commented 11 months ago

I finally found a way to minimize the amount of files, to store only values changed from defaults, here is all that we needed:

- hosts.yaml
- group_vars/all.yaml

And the example content - hosts.yaml:

all:
  vars:
    ansible_connection: ssh
    ansible_user: root
  hosts:
    node1:
      # The hostname to connect, or an external IP address of the node.
      ansible_host: mycluster.mydomain.com
      # Optionally, you can override the IP address for connecting to the node, if it differs from the hostname.
      # access_ip: 1.2.3.4
      # The node IP address in the internal network. Uncomment and set if you have a separate internal network.
      # ip: 10.0.0.2
  children:
    kube_control_plane:
      hosts:
        node1:
    kube_node:
      hosts:
        node1:
    etcd:
      hosts:
        node1:
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:

and group_vars/all.yaml:

cluster_name: mycluster.mydomain.com

supplementary_addresses_in_ssl_keys:
  - node1.mycluster.mydomain.com

helm_enabled: true

metrics_server_enabled: true
metrics_server_kubelet_insecure_tls: true

krew_enabled: true

kubectl_alias: k

And here is a "best practice" structure of the git repo:

├── inventory
│   └── mycluster
│       ├── group_vars
│       │   └── all.yaml
│       └── hosts.yaml
├── kubespray (as a git submodule)
└── .gitmodules

Having this, we can simply deploy the cluster using a command:

cd kubesprary && ansible-playbook -i ../inventory/mycluster ./cluster.yml

So, if this approach looks okay, maybe add this example as a best practice to manage minimal Kubespray configuration?

VannTen commented 11 months ago

Maybe you could add group specific files in that example. For example supplementary_addresses_in_ssl_keys should rather go in a group_vars/kube_control_plane.yml file rather than all (it's only used in the control-plane role)

MurzNN commented 11 months ago

Maybe you could add group specific files in that example. For example supplementary_addresses_in_ssl_keys should rather go in a group_vars/kube_control_plane.yml file rather than all (it's only used in the control-plane role)

I can, but then I should distribute all other values to different files too, and this will end up with a bunch of files again, instead of one convenient file with few strings. And having this in the all.yaml file will give us no conflicts. So I vote to keep this as simple as possible.

VannTen commented 11 months ago

You're not wrong, that's enough for "Get started" type of doc. We can always are more advanced examples later on.

MurzNN commented 10 months ago

I created an example repository with a minimal template of the Kubespray configuration here: https://github.com/Murz-K8s/kubespray-inventory-base Feel free to discuss and improve it.

MurzNN commented 10 months ago

Also, I have a question - can we simplify this part of the hosts.yaml file?

  children:
    # The list of hosts that should be included into the cluster
    kube_node:
      hosts:
        node1:
        # node2:

    # The list of nodes that should be used as the control plane.
    kube_control_plane:
      hosts:
        node1:

    # The list of nodes that should be used as the etcd storage.
    etcd:
      hosts:
        node1:

    # What's the purpose of this?
    k8s_cluster:
      children:
        kube_control_plane:
        kube_node:

Can we somehow get rid of the all.children.k8s_cluster.children group?

VannTen commented 8 months ago

the k8s_cluster group is used all over the place in plays, not sure how practical it would be to remove it from inventories... Maybe we could compute it at the same time that we handle "legacy groups" :thinking:

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 4 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

VannTen commented 4 months ago

/lifecycle frozen /remove-lifecycle rotten /help /good-first-issue

k8s-ci-robot commented 4 months ago

@VannTen: This request has been marked as suitable for new contributors.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/10645): >/lifecycle frozen >/remove-lifecycle rotten >/help >/good-first-issue Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / kubespray

Add description of how to manage only Kubespray custom configuration in a separate git repository #10645

Guidelines