techno-tim / k3s-ansible

The easiest way to bootstrap a self-hosted High Availability Kubernetes cluster. A fully automated HA k3s etcd install with kube-vip, MetalLB, and more. Build. Destroy. Repeat.
https://technotim.live/posts/k3s-etcd-ansible/
Apache License 2.0
2.18k stars 974 forks source link

Download k3s binary x64 Fails #526

Open nicksweb opened 3 weeks ago

nicksweb commented 3 weeks ago

I have checked through installation notes, video and completed setup steps two times over and I keep getting stuck upon running the site.yml playbook - It fails at TASK - Download k3s binary x64.

Here is the error from Ansible:

TASK [download : Download k3s binary x64] ***** fatal: [172.16.2.108]: FAILED! => {"changed": false, "dest": "/usr/local/bin/k3s", "elapsed": 0, "msg": "An unknown error occurred: 'CustomHTTPSConnection' object has no attribute 'cert_file'", "url": "https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/sha256sum-amd64.txt"}

Ansible Playbook ran with -vvv

<172.16.2.107> (1, b'/tmp/ansible_get_url_payload_b92mszfr/ansible_get_url_payload.zip/ansible/modules/get_url.py:383: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).\r\n/tmp/ansible_get_url_payload_b92mszfr/ansible_get_url_payload.zip/ansible/modules/get_url.py:386: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).\r\n\r\n{"url": "https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/sha256sum-amd64.txt", "dest": "/usr/local/bin/k3s", "elapsed": 0, "failed": true, "msg": "An unknown error occurred: \'CustomHTTPSConnection\' object has no attribute \'cert_file\'", "invocation": {"module_args": {"url": "https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/k3s", "checksum": "sha256:https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/sha256sum-amd64.txt", "dest": "/usr/local/bin/k3s", "owner": "root", "group": "root", "mode": 493, "force": false, "http_agent": "ansible-httpget", "use_proxy": true, "validate_certs": true, "force_basic_auth": false, "use_gssapi": false, "backup": false, "sha256sum": "", "timeout": 10, "unredirected_headers": [], "unsafe_writes": false, "url_username": null, "url_password": null, "client_cert": null, "client_key": null, "headers": null, "tmp_dest": null, "seuser": null, "serole": null, "selevel": null, "setype": null, "attributes": null}}}\r\n', b'Shared connection to 172.16.2.107 closed.\r\n')
<172.16.2.107> Failed to connect to the host via ssh: Shared connection to 172.16.2.107 closed.
<172.16.2.107> ESTABLISH SSH CONNECTION FOR USER: localadmin
<172.16.2.107> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="localadmin"' -o ConnectTimeout=10 -o 'ControlPath="/home/nicholaso/.ansible/cp/b2a5dad02d"' 172.16.2.107 '/bin/sh -c '"'"'rm -f -r /home/localadmin/.ansible/tmp/ansible-tmp-1718048471.7898393-2215371-104439964466615/ > /dev/null 2>&1 && sleep 0'"'"''
<172.16.2.107> (0, b'', b'')
fatal: [172.16.2.107]: FAILED! => {
    "changed": false,
    "dest": "/usr/local/bin/k3s",
    "elapsed": 0,
    "invocation": {
        "module_args": {
            "attributes": null,
            "backup": false,
            "checksum": "sha256:https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/sha256sum-amd64.txt",
            "client_cert": null,
            "client_key": null,
            "dest": "/usr/local/bin/k3s",
            "force": false,
            "force_basic_auth": false,
            "group": "root",
            "headers": null,
            "http_agent": "ansible-httpget",
            "mode": 493,
            "owner": "root",
            "selevel": null,
            "serole": null,
            "setype": null,
            "seuser": null,
            "sha256sum": "",
            "timeout": 10,
            "tmp_dest": null,
            "unredirected_headers": [],
            "unsafe_writes": false,
            "url": "https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/k3s",
            "url_password": null,
            "url_username": null,
            "use_gssapi": false,
            "use_proxy": true,
            "validate_certs": true
        }
    },
    "msg": "An unknown error occurred: 'CustomHTTPSConnection' object has no attribute 'cert_file'",
    "url": "https://github.com/k3s-io/k3s/releases/download/v1.30.0+k3s1/sha256sum-amd64.txt"
}

Investigations:

From initial investigations, it seems to be an issue with the roles/download/tasks/main.yml - get_url module.

The hosts I'm spinning up are Ubuntu Cloud Images 24.04 - Unaltered (Noble Numbat) - As part of troubleshooting the reset playbook runs without fault.

Example of the play from the download task:

- name: Download k3s binary x64
  get_url:
    url: https://github.com/k3s-io/k3s/releases/download/{{ k3s_version }}/k3s
    checksum: sha256:https://github.com/k3s-io/k3s/releases/download/{{ k3s_version }}/sha256sum-amd64.txt
    dest: /usr/local/bin/k3s
    owner: root
    group: root
    mode: 0755
  when: ansible_facts.architecture == "x86_64"

The verbose output provides a deprecation warning. In troubleshooting I've tried adding validate_certs: false to the playbook for get_url - without success.

Current Behavior

Unable to setup K3S

Steps to Reproduce

Complete a fresh setup of k3s-ansible and attempt to install on 5 VMs.

Context (variables)

Ansible Version: ansible [core 2.13.2] config file = /home/nicholaso/Documents/Github/k3s-ansible/ansible.cfg configured module search path = ['/home/nicholaso/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/nicholaso/.local/lib/python3.10/site-packages/ansible ansible collection location = /home/nicholaso/.ansible/collections:/usr/share/ansible/collections executable location = /home/nicholaso/.local/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] jinja version = 3.1.2 libyaml = True

Operating system: Ubuntu 24.04LTS

Hardware: Proxmox VMs

Have installed requirements and all modules etc are up-to-date.

Variables Used

all.yml

k3s_version: v1.30.0+k3s1
#k3s_version: v1.30.0%2Bk3s1
# this is the user that has ssh access to these machines
ansible_user: localadmin
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "Australia/Brisbane"

# interface which will be used for flannel
flannel_iface: "eth0"

# uncomment calico_iface to use tigera operator/calico cni instead of flannel https://docs.tigera.io/calico/latest/about
# calico_iface: "eth0"
calico_ebpf: false           # use eBPF dataplane instead of iptables
calico_tag: "v3.27.2"        # calico version tag

# uncomment cilium_iface to use cilium cni instead of flannel or calico
# ensure v4.19.57, v5.1.16, v5.2.0 or more recent kernel
# cilium_iface: "eth0"
cilium_mode: "native"        # native when nodes on same subnet or using bgp, else set routed
cilium_tag: "v1.15.2"        # cilium version tag
cilium_hubble: true          # enable hubble observability relay and ui

# if using calico or cilium, you may specify the cluster pod cidr pool
cluster_cidr: "10.52.0.0/16"

# enable cilium bgp control plane for lb services and pod cidrs. disables metallb.
cilium_bgp: false

# bgp parameters for cilium cni. only active when cilium_iface is defined and cilium_bgp is true.
cilium_bgp_my_asn: "64513"
cilium_bgp_peer_asn: "64512"
cilium_bgp_peer_address: "192.168.30.1"
cilium_bgp_lb_cidr: "192.168.31.0/24"   # cidr for cilium loadbalancer ipam

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "172.16.2.101"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "Hg3Izmj1J8Lpl4yCYmuIzVziOMFQL4a03FPmX8DKgqfvmYPUfpIa6mdaysLCykAi"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: "{{ ansible_facts[(cilium_iface | default(calico_iface | default(flannel_iface)))]['ipv4']['address'] }}"

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  {{ '--flannel-iface=' + flannel_iface if calico_iface is not defined and cilium_iface is not defined else '' }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
# the contents of the if block is also required if using calico or cilium
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  {% if calico_iface is defined or cilium_iface is defined %}
  --flannel-backend=none
  --disable-network-policy
  --cluster-cidr={{ cluster_cidr | default('10.52.0.0/16') }}
  {% endif %}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik

extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.7.2"

# tag for kube-vip-cloud-provider manifest
# kube_vip_cloud_provider_tag_version: "main"

# kube-vip ip range for load balancer
# (uncomment to use kube-vip for services instead of MetalLB)
# kube_vip_lb_ip_range: "192.168.30.80-192.168.30.90"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
metal_lb_speaker_tag_version: "v0.14.3"
metal_lb_controller_tag_version: "v0.14.3"

# metallb ip range for load balancer
metal_lb_ip_range: "172.16.2.101-172.16.2.120"

# Only enable if your nodes are proxmox LXC nodes, make sure to configure your proxmox nodes
# in your hosts.ini file.
# Please read https://gist.github.com/triangletodd/02f595cd4c0dc9aac5f7763ca2264185 before using this.
# Most notably, your containers must be privileged, and must not have nesting set to true.
# Please note this script disables most of the security of lxc containers, with the trade off being that lxc
# containers are significantly more resource efficient compared to full VMs.
# Mixing and matching VMs and lxc containers is not supported, ymmv if you want to do this.
# I would only really recommend using this if you have particularly low powered proxmox nodes where the overhead of
# VMs would use a significant portion of your available resources.
proxmox_lxc_configure: false
# the user that you would use to ssh into the host, for example if you run ssh some-user@my-proxmox-host,
# set this value to some-user
proxmox_lxc_ssh_user: root
# the unique proxmox ids for all of the containers in the cluster, both worker and master nodes
proxmox_lxc_ct_ids:
  - 157811
  - 157812
  - 157813
  - 157814
  - 157815

# Only enable this if you have set up your own container registry to act as a mirror / pull-through cache
# (harbor / nexus / docker's official registry / etc).
# Can be beneficial for larger dev/test environments (for example if you're getting rate limited by docker hub),
# or air-gapped environments where your nodes don't have internet access after the initial setup
# (which is still needed for downloading the k3s binary and such).
# k3s's documentation about private registries here: https://docs.k3s.io/installation/private-registry
custom_registries: false
# The registries can be authenticated or anonymous, depending on your registry server configuration.
# If they allow anonymous access, simply remove the following bit from custom_registries_yaml
#   configs:
#     "registry.domain.com":
#       auth:
#         username: yourusername
#         password: yourpassword
# The following is an example that pulls all images used in this playbook through your private registries.
# It also allows you to pull your own images from your private registry, without having to use imagePullSecrets
# in your deployments.
# If all you need is your own images and you don't care about caching the docker/quay/ghcr.io images,
# you can just remove those from the mirrors: section.
custom_registries_yaml: |
  mirrors:
    docker.io:
      endpoint:
        - "https://registry.domain.com/v2/dockerhub"
    quay.io:
      endpoint:
        - "https://registry.domain.com/v2/quayio"
    ghcr.io:
      endpoint:
        - "https://registry.domain.com/v2/ghcrio"
    registry.domain.com:
      endpoint:
        - "https://registry.domain.com"

  configs:
    "registry.domain.com":
      auth:
        username: yourusername
        password: yourpassword

# On some distros like Diet Pi, there is no dbus installed. dbus required by the default reboot command.
# Uncomment if you need a custom reboot command
# custom_reboot_command: /usr/sbin/shutdown -r now

# Only enable and configure these if you access the internet through a proxy
# proxy_env:
#   HTTP_PROXY: "http://proxy.domain.local:3128"
#   HTTPS_PROXY: "http://proxy.domain.local:3128"
#   NO_PROXY: "*.domain.local,127.0.0.0/8,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"

Hosts

host.ini

[master]
172.16.2.106
172.16.2.107
172.16.2.108

[node]
172.16.2.109
172.16.2.110

# only required if proxmox_lxc_configure: true
# must contain all proxmox instances that have a master or worker node
# [proxmox]
# 192.168.30.43

[k3s_cluster:children]
master
node

Possible Solution

UntouchedWagons commented 3 weeks ago

Yes I'm experiencing this too.

timothystewart6 commented 2 weeks ago

I haven't tested with Ubuntu 24.04 LTS yet. It could be related. At some point we will switch the CI to test on 24.04 but until then I don't have a great way to test. I am also assuming @UntouchedWagons is also using 24.04?

UntouchedWagons commented 2 weeks ago

Yeah I'm using 24.04

roguefalcon commented 2 weeks ago

For what it's worth, it worked for me on Ubuntu 24.04 on cloud-init images. I'm not sure from looking at the configs above what would be different. I've installed two separate k3s clusters without issues.

conall88 commented 2 weeks ago

This was solved for me by running: pip install -U ansible

update to ansible >=8.7.0 see here for root cause: https://github.com/void-linux/void-packages/issues/47483

The underlying issue appears to be related to the version of python3-urllib that is called. for me I was running python 3.12 when I hit this issue.