Certificate Issue - Githubissues

RaMaTHA commented 8 months ago

What happened?

I'm trying to set up kubespray on bare metal. I have two servers to add to my cluster (one master, one client). The setup went smoothly the first time I ran the script (last week). Unfortunately, I had to run the setup a second time on the same server.

Now I keep getting an error, which is different not only on the latest release (release-2.24), but also on earlier builds (e.g. release-2.20 or the current master branch).

The error I get is related to openssl, where it tries to generate a x509 certificate for the api server (see error description).

What did you expect to happen?

I would have expected the script to run normally as it did before.

How can we reproduce it (as minimally and precisely as possible)?

I just followed the instructions in the readme (README.md).

OS

Linux 5.15.0-94-generic x86_64 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.15.9] config file = /home/kubespray/kubespray/ansible.cfg configured module search path = ['/home/kubespray/kubespray/library'] ansible python module location = /home/kubespray/kubespray/kubespray-venv/lib/python3.10/site-packages/ansible ansible collection location = /home/kubespray/.ansible/collections:/usr/share/ansible/collections executable location = /home/kubespray/kubespray/kubespray-venv/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/home/kubespray/kubespray/kubespray-venv/bin/python3) jinja version = 3.1.2 libyaml = True

Version of Python

Python 3.10.12

Version of Kubespray (commit)

aeaa04ca8

Network plugin used

calico

Full inventory with variables

My inventory file:

# ## Configure 'ip' variable to bind kubernetes services on a
# ## different ip than the default iface
# ## We should set etcd_member_name for etcd cluster. The node that is not a etcd member do not need to set the value, or can set the empty string value.
[all]
node1 ansible_host=<ip-address of the host>  # ip=10.3.0.1 etcd_member_name=etcd1
# node2 ansible_host=<ip-address of the client>  # ip=10.3.0.2 etcd_member_name=etcd2
# node3 ansible_host=95.54.0.14  # ip=10.3.0.3 etcd_member_name=etcd3
# node4 ansible_host=95.54.0.15  # ip=10.3.0.4 etcd_member_name=etcd4
# node5 ansible_host=95.54.0.16  # ip=10.3.0.5 etcd_member_name=etcd5
# node6 ansible_host=95.54.0.17  # ip=10.3.0.6 etcd_member_name=etcd6

# ## configure a bastion host if your nodes are not directly reachable
# [bastion]
# bastion ansible_host=x.x.x.x ansible_user=some_user

[kube_control_plane]
node1
# node2
# node3

[etcd]
node1
# node2
# node3

[kube_node]
node1
# node3
# node4
# node5
# node6

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

Command used to invoke ansible

ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml

Output of ansible run

[...]
TASK [kubernetes/control-plane : Kubeadm | Check apiserver.crt SAN IPs] ***************************************************************

failed: [node1] (item=10.233.0.1) => {"ansible_loop_var": "item", "changed": true, "cmd": ["openssl", "x509", "-noout", "-in", "/etc/kubernetes/ssl/apiserver.crt", "-checkip", "10.233.0.1"], "delta": "0:00:00.003842", "end": "2024-02-19 21:38:09.295726", "item": "10.233.0.1", "msg": "non-zero return code", "rc": 1, "start": "2024-02-19 21:38:09.291884", "stderr": "Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt\n801B2EF0F57F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file\n801B2EF0F57F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)\nUnable to load certificate", "stderr_lines": ["Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt", "801B2EF0F57F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file", "801B2EF0F57F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)", "Unable to load certificate"], "stdout": "", "stdout_lines": []}
failed: [node1] (item=127.0.0.1) => {"ansible_loop_var": "item", "changed": true, "cmd": ["openssl", "x509", "-noout", "-in", "/etc/kubernetes/ssl/apiserver.crt", "-checkip", "127.0.0.1"], "delta": "0:00:00.003949", "end": "2024-02-19 21:38:09.437272", "item": "127.0.0.1", "msg": "non-zero return code", "rc": 1, "start": "2024-02-19 21:38:09.433323", "stderr": "Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt\n80AB6796B17F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file\n80AB6796B17F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)\nUnable to load certificate", "stderr_lines": ["Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt", "80AB6796B17F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file", "80AB6796B17F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)", "Unable to load certificate"], "stdout": "", "stdout_lines": []}
failed: [node1] (item=x.x.x.x) => {"ansible_loop_var": "item", "changed": true, "cmd": ["openssl", "x509", "-noout", "-in", "/etc/kubernetes/ssl/apiserver.crt", "-checkip", "x.x.x.x"], "delta": "0:00:00.003810", "end": "2024-02-19 21:38:09.578985", "item": "x.x.x.x", "msg": "non-zero return code", "rc": 1, "start": "2024-02-19 21:38:09.575175", "stderr": "Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt\n80EB8FBB0D7F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file\n80EB8FBB0D7F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)\nUnable to load certificate", "stderr_lines": ["Could not open file or uri for loading certificate from /etc/kubernetes/ssl/apiserver.crt", "80EB8FBB0D7F0000:error:16000069:STORE routines:ossl_store_get0_loader_int:unregistered scheme:../crypto/store/store_register.c:237:scheme=file", "80EB8FBB0D7F0000:error:80000002:system library:file_open:No such file or directory:../providers/implementations/storemgmt/file_store.c:267:calling stat(/etc/kubernetes/ssl/apiserver.crt)", "Unable to load certificate"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT ********************************************************************************************************************

PLAY RECAP ****************************************************************************************************************************
node1                      : ok=522  changed=5    unreachable=0    failed=1    skipped=660  rescued=0    ignored=3

Monday 19 February 2024  21:38:09 +0100 (0:00:00.482)       0:02:33.268 *******
===============================================================================
container-engine/runc : Download_file | Download item -------------------------------------------------------------------------- 2.99s
container-engine/crictl : Download_file | Download item ------------------------------------------------------------------------ 2.93s
download : Download_file | Download item --------------------------------------------------------------------------------------- 2.87s
container-engine/containerd : Download_file | Download item -------------------------------------------------------------------- 2.86s
container-engine/nerdctl : Download_file | Download item ----------------------------------------------------------------------- 2.83s
etcdctl_etcdutl : Download_file | Download item -------------------------------------------------------------------------------- 2.83s
container-engine/crictl : Extract_file | Unpacking archive --------------------------------------------------------------------- 2.79s
etcdctl_etcdutl : Extract_file | Unpacking archive ----------------------------------------------------------------------------- 2.71s
container-engine/containerd : Containerd | Unpack containerd archive ----------------------------------------------------------- 2.50s
container-engine/nerdctl : Extract_file | Unpacking archive -------------------------------------------------------------------- 2.32s
container-engine/validate-container-engine : Populate service facts ------------------------------------------------------------ 1.93s
etcdctl_etcdutl : Copy etcd binary --------------------------------------------------------------------------------------------- 1.82s
kubernetes/node : Enable bridge-nf-call tables --------------------------------------------------------------------------------- 1.44s
download : Download | Download files / images ---------------------------------------------------------------------------------- 1.21s
download : Extract_file | Unpacking archive ------------------------------------------------------------------------------------ 1.17s
download : Extract_file | Unpacking archive ------------------------------------------------------------------------------------ 1.10s
kubernetes/control-plane : Backup old certs and keys --------------------------------------------------------------------------- 1.00s
container-engine/nerdctl : Download_file | Create dest directory on node ------------------------------------------------------- 0.99s
container-engine/containerd : Download_file | Create dest directory on node ---------------------------------------------------- 0.97s
container-engine/runc : Download_file | Create dest directory on node ---------------------------------------------------------- 0.96s

Anything else we need to know

I have found that the error only occurs when I add my master to the cluster (where I run the script). If I only add my second server (client node) to the cluster, it still works. Does this have anything to do with my setup?

VannTen commented 8 months ago

I'm trying to set up kubespray on bare metal. I have two servers to add to my cluster (one master, one client).

Your inventory show a single host, where is your second ?

Now I keep getting an error, which is different not only on the latest release (release-2.24), but also on earlier builds (e.g. release-2.20 or the current master branch).

What do you mean ? Which versions have the problem, and which versions don't ?

VannTen commented 8 months ago

/triage needs-information

RaMaTHA commented 8 months ago

Your inventory show a single host, where is your second ?

Currently, and in my report, I have only one node enabled at a time (here, node1). If I swap (disable) node1 with (enable) node2 in my inventory.ivi, I can run the script without any errors.

For example, right now it is (simplified, short view):

[all]
node1 ansible_host=<host-ip>
# node2 ansible_host=<client-ip> 

[kube_control_plane]
node1

[etcd]
node1

[kube_node]
node1

This example doesn't work. But when I run the script on my host and enable it on my client, it works.

[all]
# node1 ansible_host=<host-ip>
node2 ansible_host=<client-ip> 

[kube_control_plane]
node2

[etcd]
node2

[kube_node]
node2

However, when I add both servers (host-node and client) to the setup, I also get the error as shown in the error description.

My inventory with both nodes enabled.

[all]
node1 ansible_host=<host-ip>
node2 ansible_host=<client-ip> 

[kube_control_plane]
node1

[etcd]
node1

[kube_node]
node2

So the question here is, why did it work perfectly the first time, but not the second time?

What do you mean ? Which versions have the problem, and which versions don't ?

I could not find out which version did not have the problem. So the error persists from release-2.20 to master.

VannTen commented 8 months ago

If you''re running with these 3 inventories in order, you'll be building two separate cluster of 1 node, then trying to build a 2 cluster nodes. This probably does not work at all.

For example, right now it is (simplified, short view):

I think you simplified too much. Could you give the full list of command runs to get the error, starting from a blank state, and all the files involved ? (including your inventory variables)

RaMaTHA commented 8 months ago

I reset the server and removed the repo, so I have a clean new system. But I am still struggling with the same error. Just to be clear, I didn't misconfigure anything. These are the exact steps I did:

git clone https://github.com/kubernetes-sigs/kubespray.git
cd kubespray
git checkout release-2.24
install ansible:
- VENVDIR=kubespray-venv
- KUBESPRAYDIR=kubespray
- python3 -m venv $VENVDIR
- source $VENVDIR/bin/activate
- pip install -U -r requirements.txt
cp -rfp inventory/sample inventory/mycluster
declare -a IPS=(<host ip-address>)
CONFIG_FILE=inventory/mycluster/hosts.yaml python3 contrib/inventory_builder/inventory.py ${IPS[@]}
Add my ssh-key from my user to the authorized keys:
- echo "$(cat ~/.ssh/id_rsa.pub)" >> ~/.ssh/authorized_keys

Adjust the inventory as follows:


[all]
node1 ansible_host=<host-ip>
# node2 ansible_host=<client-ip>

[kube_control_plane] node1

[etcd] node1

[kube_node] node1


- `ansible-playbook -i inventory/mycluster/hosts.yaml  --become --become-user=root cluster.yml -kK` 

And nearly at the end of the script, I ran into the openssl error. 

My openssl version:
- `openssl version`: `OpenSSL 3.0.2 15 Mar 2022 (Library: OpenSSL 3.0.2 15 Mar 2022)`

RaMaTHA commented 8 months ago

I figured out what caused the error. After removing all running docker images with the help of sudo docker rm -f $(sudo docker ps -qa) it worked again.

@VannTen Thanks for your help.

kubernetes-sigs / kubespray

Certificate Issue #10937

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

OS

Version of Ansible

Version of Python

Version of Kubespray (commit)

Network plugin used

Full inventory with variables

Command used to invoke ansible

Output of ansible run

Anything else we need to know