teknowill commented 2 years ago

this link gets a 404, but i did the k3s trouble shooting check list, https://github.com/techno-tim/k3s-ansible/discussions/20

I was able to get this working with a older release with most current I get:

unstable Ping VIP IP# K3S need to be < v1.24 for Rancher

Expected Behavior

VIP end point ping should be stable Helm should be able to deploy Rancher with documented commands

Current Behavior

You can ping the VIP/API IP# intermittently , so get node and helm deployments are hit or miss Rancher Deployment will fail because K3S is > 1.24

Steps to Reproduce

deploy with code base that had all.yml Commits on May 26, 2022, 3 etcd control, 5 workers Proxmox VM (across there physical nodes)
deploy longhorn in default space, shared with workers (just learning the process)
deploy minecraft > everything was stable for weeks, though it kept over poding one node
backup, take down minecraft and longhorn
reset
git pull on 8/1/22, use latest which has changes to, all.yml, main.yml, metallb configmap metallb ipaddresspool, metallb yamls, vip rbac, vip yaml
add 3 VM for dedicated longhorn > verify ansible can apt update and install proxmox guest agent and can password less ssh
add 3x IP# under nodes (3 control, 5 workers, 3 worker for longhorn only)
deploy with 8/1/22 version

Context (variables)

Operating system: Ubuntu 22

Hardware: 2x dual xeon nodes, 48 thread, 256GB Ram, 1x 1 liter node with a i5 10th 64GB Ram

Variables Used:

I didn't alter these, save adding my own token and IP# they are what have been listed in the repo

all.yml

k3s_version: "1.24"
ansible_user: NA
systemd_dir: ""

flannel_iface: ""

apiserver_endpoint: ""

k3s_token: "NA"

extra_server_args: ""
extra_agent_args: ""

kube_vip_tag_version: ""

metal_lb_speaker_tag_version: ""
metal_lb_controller_tag_version: ""

metal_lb_ip_range: ""

Hosts

host.ini

[master]
IP.ADDRESS.ONE
IP.ADDRESS.TWO
IP.ADDRESS.THREE

[node]
IP.ADDRESS.FOUR
IP.ADDRESS.FIVE

[k3s_cluster:children]
master
node

Possible Solution

It seems to deploy ok, just something isn't quite the same when it comes to the VIP/API IP# access From what I can the version changes are very particular

I tried taking various nodes offline one at a time, this didn't really help in any repeatable way. So I don't think it's any one node/vm or it's physical networking

tried just going to a older k3s , which might of helped rancher, but didn't help vip/metallb. I'll try to see if I can find a vip or metallb log file (but I don't really know vip or metallb)

for now, going to try reversing the pull request and deployment with older stack

teknowill commented 2 years ago

kept newest code, but going back to all.yml with:

k3s_version: v1.23.4+k3s1

this is the user that has ssh access to these machines

ansible_user: ---- systemd_dir: /etc/systemd/system

Set your timezone

system_timezone: "America/New_York"

interface which will be used for flannel

flannel_iface: "eth0"

apiserver_endpoint is virtual ip-address which will be configured on each master

apiserver_endpoint: "192.168.1.220"

k3s_token is required masters can talk together securely

this token should be alpha numeric only

k3s_token: "some-SUPER-DEDEUPER-secret-password"

change these to your liking, the only required one is--no-deploy servicelb

extra_server_args: "--no-deploy servicelb --no-deploy traefik" extra_agent_args: ""

image tag for kube-vip

kube_vip_tag_version: "v0.4.4"

image tag for metal lb

metal_lb_speaker_tag_version: "v0.12.1" metal_lb_controller_tag_version: "v0.12.1"

metallb ip range for load balancer

metal_lb_ip_range: "192.168.1.221-192.168.1.239"

the api IP# and get nodes are now stable rancher will deploy

however after kubectl expose deployment rancher -n cattle-system --type=LoadBalancer --name=rancher-lb --port=443

external IP# is pending forever

teknowill commented 2 years ago

kubectl get all -A NAMESPACE NAME READY STATUS RESTARTS AGE cattle-fleet-local-system pod/fleet-agent-699b5fb945-nsxnx 1/1 Running 0 13m cattle-fleet-system pod/fleet-controller-784d6fbcd8-hngpn 1/1 Running 0 14m cattle-fleet-system pod/gitjob-6b977748fc-7rsh8 1/1 Running 0 14m cattle-system pod/helm-operation-7vpxj 0/2 Completed 0 14m cattle-system pod/helm-operation-8m924 0/2 Completed 0 13m cattle-system pod/helm-operation-fcx2g 0/2 Completed 0 15m cattle-system pod/helm-operation-v89kg 0/2 Completed 0 14m cattle-system pod/rancher-7fd65d9cd6-2f5vv 1/1 Running 0 17m cattle-system pod/rancher-7fd65d9cd6-qlqp7 1/1 Running 0 17m cattle-system pod/rancher-7fd65d9cd6-slnqr 1/1 Running 0 17m cattle-system pod/rancher-webhook-5b65595df9-l5b7x 1/1 Running 0 13m cert-manager pod/cert-manager-76d44b459c-kdzqv 1/1 Running 0 18m cert-manager pod/cert-manager-cainjector-9b679cc6-wp959 1/1 Running 0 18m cert-manager pod/cert-manager-webhook-57c994b6b9-tqgnv 1/1 Running 0 18m kube-system pod/coredns-5789895cd-cbzzm 1/1 Running 0 56m kube-system pod/kube-vip-ds-4lntv 1/1 Running 0 56m kube-system pod/kube-vip-ds-j65z7 1/1 Running 2 (16m ago) 56m kube-system pod/kube-vip-ds-m8vcq 1/1 Running 0 56m kube-system pod/local-path-provisioner-6c79684f77-zk8jj 1/1 Running 0 56m kube-system pod/metrics-server-7cd5fcb6b7-czzjp 1/1 Running 0 56m metallb-system pod/controller-74df79bb55-qvldk 1/1 Running 0 56m metallb-system pod/speaker-28fk6 1/1 Running 0 53m metallb-system pod/speaker-2mhzf 1/1 Running 0 56m metallb-system pod/speaker-8zwrg 1/1 Running 0 53m metallb-system pod/speaker-96mb5 1/1 Running 0 53m metallb-system pod/speaker-bmhpn 1/1 Running 0 56m metallb-system pod/speaker-jggcr 1/1 Running 0 53m metallb-system pod/speaker-mr7mc 1/1 Running 0 53m metallb-system pod/speaker-rb8dp 1/1 Running 0 53m metallb-system pod/speaker-rkktx 1/1 Running 0 53m metallb-system pod/speaker-t89s6 1/1 Running 0 53m metallb-system pod/speaker-v7vss 1/1 Running 0 56m

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cattle-fleet-system service/gitjob ClusterIP 10.43.59.181 80/TCP 14m cattle-system service/rancher ClusterIP 10.43.230.173 80/TCP,443/TCP 17m cattle-system service/rancher-lb LoadBalancer 10.43.192.203 443:31202/TCP 11m cattle-system service/rancher-webhook ClusterIP 10.43.112.129 443/TCP 13m cattle-system service/webhook-service ClusterIP 10.43.111.116 443/TCP 13m cert-manager service/cert-manager ClusterIP 10.43.247.67 9402/TCP 18m cert-manager service/cert-manager-webhook ClusterIP 10.43.170.208 443/TCP 18m default service/kubernetes ClusterIP 10.43.0.1 443/TCP 57m kube-system service/kube-dns ClusterIP 10.43.0.10 53/UDP,53/TCP,9153/TCP 56m kube-system service/metrics-server ClusterIP 10.43.76.55 443/TCP 56m metallb-system service/webhook-service ClusterIP 10.43.125.245 443/TCP 56m

NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE kube-system daemonset.apps/kube-vip-ds 3 3 3 3 3 56m metallb-system daemonset.apps/speaker 11 11 11 11 11 kubernetes.io/os=linux 56m

NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE cattle-fleet-local-system deployment.apps/fleet-agent 1/1 1 1 13m cattle-fleet-system deployment.apps/fleet-controller 1/1 1 1 14m cattle-fleet-system deployment.apps/gitjob 1/1 1 1 14m cattle-system deployment.apps/rancher 3/3 3 3 17m cattle-system deployment.apps/rancher-webhook 1/1 1 1 13m cert-manager deployment.apps/cert-manager 1/1 1 1 18m cert-manager deployment.apps/cert-manager-cainjector 1/1 1 1 18m cert-manager deployment.apps/cert-manager-webhook 1/1 1 1 18m kube-system deployment.apps/coredns 1/1 1 1 56m kube-system deployment.apps/local-path-provisioner 1/1 1 1 56m kube-system deployment.apps/metrics-server 1/1 1 1 56m metallb-system deployment.apps/controller 1/1 1 1 56m

NAMESPACE NAME DESIRED CURRENT READY AGE cattle-fleet-local-system replicaset.apps/fleet-agent-699b5fb945 1 1 1 13m cattle-fleet-local-system replicaset.apps/fleet-agent-86b78d86bf 0 0 0 13m cattle-fleet-system replicaset.apps/fleet-controller-784d6fbcd8 1 1 1 14m cattle-fleet-system replicaset.apps/gitjob-6b977748fc 1 1 1 14m cattle-system replicaset.apps/rancher-7fd65d9cd6 3 3 3 17m cattle-system replicaset.apps/rancher-webhook-5b65595df9 1 1 1 13m cert-manager replicaset.apps/cert-manager-76d44b459c 1 1 1 18m cert-manager replicaset.apps/cert-manager-cainjector-9b679cc6 1 1 1 18m cert-manager replicaset.apps/cert-manager-webhook-57c994b6b9 1 1 1 18m kube-system replicaset.apps/coredns-5789895cd 1 1 1 56m kube-system replicaset.apps/local-path-provisioner-6c79684f77 1 1 1 56m kube-system replicaset.apps/metrics-server-7cd5fcb6b7 1 1 1 56m metallb-system replicaset.apps/controller-74df79bb55 1 1 1 56m

teknowill commented 2 years ago

api enpoint ip stable ping with all below

will try again with newer k3s

but with 1.23

helm install cert-manager jetstack/cert-manager --namespace cert-manager --version v1.7.1 gets stuck even though kubectl get pods --namespace cert-manager NAME READY STATUS RESTARTS AGE cert-manager-76d44b459c-wr4bp 1/1 Running 0 3m3s cert-manager-cainjector-9b679cc6-nnx9m 1/1 Running 0 3m3s cert-manager-startupapicheck-nrrkb 1/1 Running 2 (58s ago) 3m2s cert-manager-webhook-57c994b6b9-7w959 1/1 Running 0 3m3s

k3s_version: v1.23.4+k3s1

this is the user that has ssh access to these machines

ansible_user: ---- systemd_dir: /etc/systemd/system

Set your timezone

system_timezone: "America/New_York"

interface which will be used for flannel

flannel_iface: "eth0"

apiserver_endpoint is virtual ip-address which will be configured on each master

apiserver_endpoint: "192.168.1.220"

k3s_token is required masters can talk together securely

this token should be alpha numeric only

k3s_token: "some-SUPER-DEDEUPER-secret-password"

change these to your liking, the only required one is--no-deploy servicelb

extra_server_args: "--no-deploy servicelb --no-deploy traefik" extra_agent_args: ""

image tag for kube-vip

kube_vip_tag_version: "v0.4.4"

kube_vip_tag_version: "v0.5.0"

image tag for metal lb

metal_lb_speaker_tag_version: "v0.12.1"

metal_lb_controller_tag_version: "v0.12.1"

metal_lb_speaker_tag_version: "v0.13.4" metal_lb_controller_tag_version: "v0.13.4"

metallb ip range for load balancer

metal_lb_ip_range: "192.168.1.221-192.168.1.239"

teknowill commented 2 years ago

kubectl get pods --namespace cert-manager NAME READY STATUS RESTARTS AGE cert-manager-76d44b459c-wr4bp 1/1 Running 0 5m6s cert-manager-cainjector-9b679cc6-nnx9m 1/1 Running 0 5m6s cert-manager-startupapicheck-nrrkb 0/1 CrashLoopBackOff 3 (18s ago) 5m5s cert-manager-webhook-57c994b6b9-7w959 1/1 Running 0 5m6s

teknowill commented 2 years ago

https://github.com/cert-manager/cert-manager/issues/2773 helps if I don't skip a line in the docs...

teknowill commented 2 years ago

ok get nodes and ping to the API are stable, using the newest all,yml able to cert manager

k3s_version: v1.24.3+k3s1

k3s_version: v1.23.4+k3s1

this is the user that has ssh access to these machines

ansible_user: --- systemd_dir: /etc/systemd/system

Set your timezone

system_timezone: "America/New_York"

interface which will be used for flannel

flannel_iface: "eth0"

apiserver_endpoint is virtual ip-address which will be configured on each master

apiserver_endpoint: "192.168.1.220"

k3s_token is required masters can talk together securely

this token should be alpha numeric only

k3s_token: "some-SUPER-DEDEUPER-secret-password"

change these to your liking, the only required one is--no-deploy servicelb

extra_server_args: "--no-deploy servicelb --no-deploy traefik" extra_agent_args: ""

image tag for kube-vip

kube_vip_tag_version: "v0.4.4"

kube_vip_tag_version: "v0.5.0"

image tag for metal lb

metal_lb_speaker_tag_version: "v0.12.1"

metal_lb_controller_tag_version: "v0.12.1"

metal_lb_speaker_tag_version: "v0.13.4" metal_lb_controller_tag_version: "v0.13.4"

metallb ip range for load balancer

metal_lb_ip_range: "192.168.1.221-192.168.1.239"

teknowill commented 2 years ago

helm install rancher rancher-stable/rancher \ --namespace cattle-system ....

Error: INSTALLATION FAILED: chart requires kubeVersion: < 1.24.0-0 which is incompatible with Kubernetes v1.24.3+k3s1

looks there is 1.24 rancher out there https://github.com/rancher/client-go/releases/tag/v1.24.0-rancher1

but not fully ready yet https://github.com/rancher/rancher/issues/37711

trying to figure out how to point to it, but guess I just need to stick with k3s_version: v1.23.4+k3s1 for now, but at least I can use the newer VIP and metalLB, not sure what is "different"

teknowill commented 2 years ago

Yea that did it NAME: rancher LAST DEPLOYED: Mon Aug 1 19:10:43 2022 NAMESPACE: cattle-system STATUS: deployed

suggest setting all.yml back to v1.23.4+k3s1

until 1.24 rancher is ready, at least in your main branch

timothystewart6 commented 2 years ago

Rancher is not yet compatible with k3s 1.24. It may be soon but that is really going to be up to Rancher to make it compatible.

timothystewart6 commented 2 years ago

I think support is coming in k3s 2.67, I would check their release notes.

timothystewart6 commented 2 years ago

also rancher isn't compatible with the latest cert-manager

timothystewart6 commented 2 years ago

also, I've been pining my vip for over an hour now and it's stable.

teknowill commented 2 years ago

after the did a pull I had a unstable ping for 2x re-deployment rounds with no error in output before I put up the post. I think just something minor was off in my all.yml copy, as later deployments were very stable. I mentioned in later comment that I ended up being able to get stable ping, sorry you wasted time pining. can no longer reproduce.

Re: Support is coming in 2.67, thank you, yes found that as well. https://github.com/rancher/rancher/issues/37711

Re: 1.24, just trying to point out that the source copy of all.yml on your repo has a line that installs 1.24 I know a k3s deployment and rancher are separate, and I don't know the k3s deployment's target, but if people follow your guidance docs & videos, pull /clone the all.yml script that points to 1.24 and try to deploy rancher they'll run into it, not a big deal.

just suggesting a known rancher compatible version stack/branch for your k3s deployment, as well as a latest k3s stack, in this case it's only a line, but I think rancher will always lag behind k3s and you also mentioned cert manager versions, could be other things down the line. I could try to expand an ansiable script for a rancher on top stack, post it, if there's any interest / value

timothystewart6 commented 2 years ago

Thank you for bringing this up and all the details

techno-tim / k3s-ansible

Unstable VIP PING & K3S too new for Rancher #47

Expected Behavior

Current Behavior

Steps to Reproduce

Context (variables)

Variables Used:

Hosts

Possible Solution

this is the user that has ssh access to these machines

Set your timezone

interface which will be used for flannel

apiserver_endpoint is virtual ip-address which will be configured on each master

k3s_token is required masters can talk together securely

this token should be alpha numeric only

change these to your liking, the only required one is--no-deploy servicelb

image tag for kube-vip

image tag for metal lb

metallb ip range for load balancer

this is the user that has ssh access to these machines

Set your timezone

interface which will be used for flannel

apiserver_endpoint is virtual ip-address which will be configured on each master

k3s_token is required masters can talk together securely

this token should be alpha numeric only

change these to your liking, the only required one is--no-deploy servicelb

image tag for kube-vip

kube_vip_tag_version: "v0.4.4"

image tag for metal lb

metal_lb_speaker_tag_version: "v0.12.1"

metal_lb_controller_tag_version: "v0.12.1"

metallb ip range for load balancer

k3s_version: v1.23.4+k3s1

this is the user that has ssh access to these machines

Set your timezone

interface which will be used for flannel

apiserver_endpoint is virtual ip-address which will be configured on each master

k3s_token is required masters can talk together securely

this token should be alpha numeric only

change these to your liking, the only required one is--no-deploy servicelb

image tag for kube-vip

kube_vip_tag_version: "v0.4.4"

image tag for metal lb

metal_lb_speaker_tag_version: "v0.12.1"

metal_lb_controller_tag_version: "v0.12.1"

metallb ip range for load balancer