kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
15.91k stars 6.42k forks source link

Cluster upgrades fail due to encryption logic #8610

Closed cristicalin closed 2 years ago

cristicalin commented 2 years ago

Environment:

Kubespray version (commit) (git rev-parse --short HEAD):

0fc453fe

Network plugin used:

calico

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

[all]
ubuntu-nuc-00.kaveman.intra ansible_connection=local local_as=64512

[all:vars]
upgrade_cluster_setup=True
force_certificate_regeneration=True
etcd_kubeadm_enabled=True
download_container=False
peer_with_router=False

[kube_control_plane]
ubuntu-nuc-00.kaveman.intra

[kube_control_plane:vars]

[etcd:children]
kube_control_plane

[kube_node]
ubuntu-nuc-00.kaveman.intra

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

[k8s_cluster:vars]
kube_version=v1.22.7
calico_version=v3.22.0
helm_enabled=True
metrics_server_enabled=True
ingress_nginx_enabled=True
cert_manager_enabled=False
metallb_enabled=True
metallb_speaker_enabled=False
metallb_protocol=bgp
metallb_controller_tolerations=[{'effect':'NoSchedule','key':'node-role.kubernetes.io/master'},{'effect':'NoSchedule','key':'node-role.kubernetes.io/control-plane'}]
metallb_ip_range=["10.5.0.0/16"]
kube_proxy_strict_arp=True
kube_encrypt_secret_data=True
container_manager=containerd
kubernetes_audit=True
calico_datastore="kdd"
calico_iptables_backend="NFT"
calico_advertise_cluster_ips=True
calico_felix_prometheusmetricsenabled=True
calico_ipip_mode=Never
calico_vxlan_mode=Never
calico_advertise_service_loadbalancer_ips=["10.5.0.0/16"]
calico_ip_auto_method: "interface=eno1""
kube_network_plugin_multus=True
kata_containers_enabled=False
runc_version=v1.1.0
typha_enabled=True
nodelocaldns_external_zones=[{'cache': 30,'zones':['kaveman.intra'],'nameservers':['192.168.0.1']}]
nodelocaldns_bind_metrics_host_ip=True
csi_snapshot_controller_enabled=True
deploy_netchecker=True
krew_enabled=True

Command used to invoke ansible:

ansible-playbook -i ../inventory.ini cluster.yml -vvv

Output of ansible run:

TASK [kubernetes/control-plane : Extract secret value from secrets_encryption.yaml] *******************************************************************************************************************************
task path: /root/kubespray/roles/kubernetes/control-plane/tasks/encrypt-at-rest.yml:21
fatal: [ubuntu-nuc-00.kaveman.intra]: FAILED! => {
    "msg": "The task includes an option with an undefined variable. The error was: No first item, sequence was empty.\n\nThe error appears to be in '/root/kubespray/roles/kubernetes/control-plane/tasks/encrypt-at-rest.yml': line 21, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Extract secret value from secrets_encryption.yaml\n  ^ here\n"
}

Anything else do we need to know:

cristicalin commented 2 years ago

It looks like the code is missing some sanity checks:

(venv) root@ubuntu-nuc-00:~/kubespray# cat /etc/kubernetes/ssl/secrets_encryption.yaml 
kind: EncryptionConfig
apiVersion: v1
resources:
  - resources:
    - secrets

    providers:
    - aescbc:
        keys:
        - name: key
          secret: <removed on purpose>
    - identity: {}
(venv) root@ubuntu-nuc-00:~/kubespray# grep -r secrets_encryption_query roles/
roles/kubernetes/control-plane/defaults/main/main.yml:secrets_encryption_query: "resources[*].providers[0].{{kube_encryption_algorithm}}.keys[0].secret"
roles/kubernetes/control-plane/tasks/encrypt-at-rest.yml:    kube_encrypt_token_extracted: "{{ secret_file_decoded | json_query(secrets_encryption_query) | first | b64decode }}"
(venv) root@ubuntu-nuc-00:~/kubespray# grep -r kube_encryption_algorithm roles/
roles/kubernetes/control-plane/defaults/main/main.yml:kube_encryption_algorithm: "secretbox"
roles/kubernetes/control-plane/defaults/main/main.yml:secrets_encryption_query: "resources[*].providers[0].{{kube_encryption_algorithm}}.keys[0].secret"
roles/kubernetes/control-plane/templates/secrets_encryption.yaml.j2:    - {{ kube_encryption_algorithm }}:
(venv) root@ubuntu-nuc-00:~/kubespray# cat roles/kubernetes/control-plane/templates/secrets_encryption.yaml.j2
kind: EncryptionConfig
apiVersion: v1
resources:
  - resources:
{{ kube_encryption_resources|to_nice_yaml|indent(4, True) }}
    providers:
    - {{ kube_encryption_algorithm }}:
        keys:
        - name: key
          secret: {{ kube_encrypt_token | b64encode }}
    - identity: {}
cristicalin commented 2 years ago

https://github.com/kubernetes-sigs/kubespray/pull/8574 seems to be the culprit here. Maybe some sanity checks are warranted instead of the later failure in the playbook execution or a better detection of the currently configured value.

/cc @Payback159

Payback159 commented 2 years ago

Hi @cristicalin , I probably won't get around to it this week. Next week I should be able to take a look at it.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/8610#issuecomment-1207387508): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
brysonshepherd commented 1 year ago

/reopen

k8s-ci-robot commented 1 year ago

@brysonshepherd: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/8610#issuecomment-1443185005): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
brysonshepherd commented 1 year ago

this issue was never addressed. and there is not a documented way to upgrade from kubespray 2.18 to 2.19 if kube_encrypt_secret_data was originally set to true in 2.18 using kube_encryption_algorithm: "aescbc", besides pinning the kube_encryption_algorithm