kurokobo / awx-on-k3s

An example implementation of AWX on single node K3s using AWX Operator, with easy-to-use simplified configuration with ownership of data and passwords.
MIT License
518 stars 143 forks source link

Yet another IPv6 question, dual stack k3s cluster, unable to route traffic to remote node #367

Closed fahadshery closed 4 weeks ago

fahadshery commented 1 month ago

Environment

Description

I have enabled the dual stack using the following command:

curl -sfL http://get.k3s.io | INSTALL_K3S_VERSION=v1.29.4+k3s1 sh -s --write-kubeconfig-mode 644 --cluster-cidr=10.42.0.0/16,2001:cafe:42::/56 --service-cidr=10.43.0.0/16,2001:cafe:43::/112 --node-ip=10.13.X.X,2a00:2388:XXXX:XXX::X

This gives me the following:

NAME                                                   READY   STATUS      RESTARTS      AGE
pod/awx-operator-controller-manager-66d859dc9f-djtwf   2/2     Running     2 (17d ago)   17d
pod/awx-postgres-15-0                                  1/1     Running     0             17d
pod/awx-web-6d4cf99b8-wrvhw                            3/3     Running     0             17d
pod/awx-migration-24.3.1-ltm83                         0/1     Completed   0             17d
pod/awx-task-58786fc688-cvb8m                          4/4     Running     0             17d

NAME                                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/awx-operator-controller-manager-metrics-service   ClusterIP   10.43.36.157   <none>        8443/TCP   17d
service/awx-postgres-15                                   ClusterIP   None           <none>        5432/TCP   17d
service/awx-service                                       ClusterIP   10.43.207.5    <none>        80/TCP     17d

NAME                                              READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/awx-operator-controller-manager   1/1     1            1           17d
deployment.apps/awx-web                           1/1     1            1           17d
deployment.apps/awx-task                          1/1     1            1           17d

NAME                                                         DESIRED   CURRENT   READY   AGE
replicaset.apps/awx-operator-controller-manager-66d859dc9f   1         1         1       17d
replicaset.apps/awx-web-6d4cf99b8                            1         1         1       17d
replicaset.apps/awx-task-58786fc688                          1         1         1       17d

NAME                               READY   AGE
statefulset.apps/awx-postgres-15   1/1     17d

NAME                             COMPLETIONS   DURATION   AGE
job.batch/awx-migration-24.3.1   1/1           18s        17d

NAME                                    CLASS     HOSTS                     ADDRESS                             PORTS     AGE
[ingress.networking.k8s.io/awx-ingress](http://ingress.networking.k8s.io/awx-ingress)   traefik   [mytower.example.com](http://mytower.example.com/)   10.13.X.X,2a00:2388:XXXX:XXX::X   80, 443   17d

I can also see the IPv4 and IPv6 assigned to the pods:

kubectl describe pod awx-task-6d5fs99v8-wrvhw -n awx | grep IP

IP:                     10.42.0.38
IPs:
   IP:             10.42.0.38
   IP:             2001:cafe:42::26

I can exec into the pods and they can communicate using IPv6 addresses but I am unable to connect to remote hosts using ssh...(getting connection timeout) PING is not installed inside the pods and I can't sudo install anything (asks for root password).

Whereas from the host machine, I can ssh into the remote nodes without any issues...

Any ideas how to resolve?

kurokobo commented 1 month ago

Launch CentOS 9 pod with root privilege and debug your connectirivity.

$ kubectl -n awx run debug-centos --restart=Never -it --rm --command /usr/bin/bash --image=quay.io/centos/centos:stream9
If you don't see a command prompt, try pressing enter.
[root@debug-centos /]# 

You can install any packages with dnf command for debugging, e.g. iproute, openssh-clients.

fahadshery commented 1 month ago

You can install any packages with dnf command for debugging, e.g. iproute, openssh-clients.

thanks, this was really useful. I'm using a corporate proxy. Once this pod was launched, I can't run anything apparently i.e. dnf install iputils would just be stuck at:

CentOS Stream 9 - BaseOS                 [                 ===        ]    ---   B/s | 0  B --:-- ETA

my /etc/environment and /etc/systemd/system/k3s.service.env files looks like this:

HTTPS_PROXY="http://corporate.proxy.com:8080"
HTTP_PROXY="http://corporate.proxy.com:8080"
NO_PROXY="127.0.0.1,localhost,IPv4_ADDRESS_OF_AWX_VM,IPv6_ADDRESS_OF_AWX_VM

So looks fine but I feel like it must be the routing issue...any ideas?

fahadshery commented 1 month ago

ok I have resolved it... if you're behind a corporate proxy then within the debug pod, just export the proxy using:

export HTTPS_PROXY=http://corporate.proxy.com:8080
export HTTP_PROXY=http://corporate.proxy.com:8080

this will enable proxy for the debug-pod and you will be able to install any software. 👍

I tried two ways to sort out the IPv6 routing.

  1. I tried to assign a valid IPv6 cluster-cidr and service-cidr but I was getting an error of cluster-cidr mask needs to be less than to the node-ip mask.
  2. I reverted back to using cafe ipv6 networks and since I was using the dual stack. I enabled the --flannel-ipv6-masq option. This fixed the issue.

The last question: I tend to run shell commands in ansible playbooks. There are some packages missing such as netcat or expect when executing those playbooks. How and where to include these packages so that awx has access to these?

Many thanks for your support.

kurokobo commented 4 weeks ago

@fahadshery Hi, sorry for the late reply, I haven't been feeling well and couldn't find the time to respond. Anyway glad to hear that you can run your job on IPv6 👍

I tend to run shell commands in ansible playbooks. There are some packages missing such as netcat or expect when executing those playbooks. How and where to include these packages so that awx has access to these?

Your playbook is launched in the execution environment. So you should install required packages in your EE image by using Ansible Builder, push the image, and specify it for your job template. To build customized EE images, refer to the guide for Ansible Builder.

kurokobo commented 4 weeks ago

I'm closing this issue but feel free to create new issue here or open new topic on the Ansible Community Forum if you have trouble during building custom EE image by Ansible Builder. Thanks!