<class 'jinja2.exceptions.TemplateRuntimeError'>, original message: No filter named 'split' found." when trying to run the playbook

MarkhamLee commented 12 months ago

When attempting to run the playbook to setup K3s on three nodes I get the following fatal error:

fatal: [192.168.47.124]: FAILED! => {"msg": "An unhandled exception occurred while templating '{% if groups[group_name_master | default('master')] | length > 1 %}\n {% if ansible_hostname == hostvars[groups[group_name_master | default('master')][0]]['ansible_hostname'] %}\n --cluster-init\n {% else %}\n --server https://{{ hostvars[groups[group_name_master | default('master')][0]].k3s_node_ip | split(\",\") | first | ansible.utils.ipwrap }}:6443\n {% endif %}\n --token {{ k3s_token }}\n{% endif %} {{ extra_server_args | default('') }}'. Error was a <class 'jinja2.exceptions.TemplateRuntimeError'>, original message: No filter named 'split' found."}

I get this for my node with IPs endign in .124 and .132.

Next the playbook attempts to connect to 192.168.47.120, which fails after 20 tries and then I get the following error:

fatal: [192.168.47.120]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["k3s", "kubectl", "get", "nodes", "-l", "node-role.kubernetes.io/master=true", "-o=jsonpath={.items[*].metadata.name}"], "delta": "0:00:00.072262", "end": "2023-11-05 19:59:39.879683", "rc": 0, "start": "2023-11-05 19:59:39.807421", "stderr": "", "stderr_lines": [], "stdout": "node2", "stdout_lines": ["node2"]}

Expected Behavior

The playbook should install K3s on all three nodes and they should all be able to connect to each other.

Current Behavior

The playbook fails to execute fully and install K3s on all three of my nodes. However, if I just try to install on one node, it appears to install just fine, trying to add subsequent nodes via the playbook fails as well.

Steps to Reproduce

create a directory called my-cluster within inventory, copy group_vars into it
Edit the hosts.ini file with the three IP addresses of my nodes
I installed the requirements.txt file via this command

ansible-galaxy install -r ./collections/requirements.yml

Ran the followign command to run the playbook:

ansible-galaxy install -r ./collections/requirements.yml

Context (variables)

Operating system: Ubuntu 22.04.3 LTS

Hardware: Beelink SER5. AMD Ryzen 5 5560 16 GB of RAM, 2 TB NVME in each

Variables Used

all.yml

k3s_version: v1.25.12+k3s1
# this is the user that has ssh access to these machines
ansible_user: markham
systemd_dir: /etc/systemd/system

# Set your timezone
system_timezone: "America/Los_Angeles"

# interface which will be used for flannel
flannel_iface: "enp1s0"

# apiserver_endpoint is virtual ip-address which will be configured on each master
apiserver_endpoint: "192.168.30.222"

# k3s_token is required  masters can talk together securely
# this token should be alpha numeric only
k3s_token: "super-secret-alphanumeric-password-no-special-characters"

# The IP on which the node is reachable in the cluster.
# Here, a sensible default is provided, you can still override
# it for each of your hosts, though.
k3s_node_ip: '{{ ansible_facts[flannel_iface]["ipv4"]["address"] }}'

# Disable the taint manually by setting: k3s_master_taint = false
k3s_master_taint: "{{ true if groups['node'] | default([]) | length >= 1 else false }}"

# these arguments are recommended for servers as well as agents:
extra_args: >-
  --flannel-iface={{ flannel_iface }}
  --node-ip={{ k3s_node_ip }}

# change these to your liking, the only required are: --disable servicelb, --tls-san {{ apiserver_endpoint }}
extra_server_args: >-
  {{ extra_args }}
  {{ '--node-taint node-role.kubernetes.io/master=true:NoSchedule' if k3s_master_taint else '' }}
  --tls-san {{ apiserver_endpoint }}
  --disable servicelb
  --disable traefik
extra_agent_args: >-
  {{ extra_args }}

# image tag for kube-vip
kube_vip_tag_version: "v0.5.12"

# metallb type frr or native
metal_lb_type: "native"

# metallb mode layer2 or bgp
metal_lb_mode: "layer2"

# bgp options
# metal_lb_bgp_my_asn: "64513"
# metal_lb_bgp_peer_asn: "64512"
# metal_lb_bgp_peer_address: "192.168.30.1"

# image tag for metal lb
metal_lb_speaker_tag_version: "v0.13.9"
metal_lb_controller_tag_version: "v0.13.9"

# metallb ip range for load balancer
metal_lb_ip_range: "192.168.47.200-192.168.47.220"

Hosts

host.ini

[master]
192.168.47.120
192.168.47.124
192.168.47.132

[k3s_cluster:children]
master
node

Possible Solution

I "think" the offending code is in main.yml, namely, the following as it has the split command that no longer works.

server_init_args: >-
  {% if groups[group_name_master | default('master')] | length > 1 %}
    {% if ansible_hostname == hostvars[groups[group_name_master | default('master')][0]]['ansible_hostname'] %}
      --cluster-init
    {% else %}
      --server https://{{ hostvars[groups[group_name_master | default('master')][0]].k3s_node_ip | split(",") | first | ansible.utils.ipwrap }}:6443
    {% endif %}
    --token {{ k3s_token }}
  {% endif %}
  {{ extra_server_args | default('') }}

[ x] I've checked the General Troubleshooting Guide
Followed the checklist, triple checked that I properly copied SSH keys so that all nodes can ssh into each other without a password, made sure to run reset and then reboot each machine, turned off the Ubuntu firewall.

MarkhamLee commented 11 months ago

I got it working, this is specifically an ansible issue. Trying to fix the ansible install via the pip method didn't work, so I did the following to install ansible via sudo apt

Uninstalling every trace of ansible I could find
sudo apt update
sudo apt upgrade
sudo apt install software-properties-common
sudo apt-add-repository ppa:ansible/ansible
sudo apt update
sudo apt install ansible

Once I did that everything worked fine. Thank you for putting this playbook together, once I got past my ansible hiccups it was smooth sailing.

MarkhamLee commented 11 months ago

Unless there are objections, I'll submit a pull request for people to try the above approach in case ansible doesn't install properly.

techno-tim / k3s-ansible