Closed sleiner closed 2 years ago
Can you give an example of what your group variables would look like using ipv6?
Sure 😊
# apiserver_endpoint is virtual ip-address which will be configured on each master
-apiserver_endpoint: "192.168.30.222"
+apiserver_endpoint: "2001:db8::1"
By the way, I have also tested IPv6 addresses for the MetalLB range - and it just works ™️
For example:
# metallb ip range for load balancer
-metal_lb_ip_range: "192.168.30.80-192.168.30.90"
+metal_lb_ip_range: "2001:db8::1b:0/112"
I have also tested IPv6 addresses for the MetalLB range - and it just works ™️
Turns out that is not true. I will take a closer look here...
So when reducing my setup to the minimal working example, I'm setting these variables different than the example:
group vars:
apiserver_endpoint: 2001:db8::1
metal_lb_ip_range:
- 2001:db8::1b:0/112
- 10.10.10.0/24
extra_server_args: >-
--no-deploy servicelb
--no-deploy traefik
--disable-network-policy
--service-cidr=10.43.0.0/16,2001:db8:43::/112
--cluster-cidr=10.42.0.0/16,2001:db8:42::/64
--node-ip={{ k3s_node_ip }}
extra_agent_args: >-
--node-ip={{ k3s_node_ip }}
host vars (for each server and agent):
k3s_node_ip: 10.10.3.1,2001:db8::de:1
so:
2001:db8:42::/64
is the network for the pods (IPv4: 10.42.0.0/16
)2001:db8:43::/112
is the network for the services (IPv4: 10.43.0.0/16
)2001:db8::/64
is the network in which the public services of the cluster are available (IPv4: 10.10.0.0/16
)
2001:db8::1
is the IP on which the Kubernetes API is available2001:db8::de:0/112
is used for cluster nodes (IPv4: 10.10.3.0/24
)2001:db8::1b:0/112
is used by MetalLB (IPv4: 10.10.10.0/24
)As much as I'd love to merge this, this does break the API contract by changing the metallb IP range from a string to an array. Not certain how I feel about this yet. I may close this and use it as an example for someone who wants to support ipv6
this does break the API contract by changing the metallb IP range from a string to an array.
That is not the case :-) If you take a look at the metallb.ipaddresspool.j2
template, you will see that the current use case of specifying the range as a string is handled gracefully. So we get no breaking change.
Still, I decided to adapt all examples in the repo since it improves the discoverability of specifying multiple ranges.
Not certain how I feel about this yet.
Just to be sure: Are you talking about breaking changes or about modelling the ranges as list?
If we can get https://github.com/techno-tim/k3s-ansible/pull/57 working, I would feel much mor confident merging this in
@timothystewart6 is this the race condition you were talking about?
It is. I have some ideas, I will fix it tonight!
I know this might be a late ask but rather than change metal_lb_ip_range
to an array
, how about just adding another var like metal_lb_ip_range_ipv6
which is also a string
? This way the existing interface does not change and then you don't have to check the type and loop over it. Thoughts? It also preserves the type going forward. Open to other ideas but I want to be sure that strings work too and that we test both use cases somehow. Just asking the question, doesn't mean it needs to change! 🙂
@timothystewart6 applying the MetalLB CRs failed again (one of three times) - should we just add a retry here? :/
TASK [k3s/post : Apply metallb CRs] ********************************************
failed: [control1] (item=control1) => {"ansible_loop_var": "item", "changed": false, "cmd": ["k3s", "kubectl", "apply", "-f", "/tmp/k3s/metallb-crs.yaml"], "delta": "0:00:08.483325", "end": "2022-09-04 13:15:31.104057", "item": "control1", "msg": "non-zero return code", "rc": 1, "start": "2022-09-04 13:15:22.620732", "stderr": "Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"[https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\](https://webhook-service.metallb-system.svc/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\)": EOF", "stderr_lines": ["Error from server (InternalError): error when creating \"/tmp/k3s/metallb-crs.yaml\": Internal error occurred: failed calling webhook \"l2advertisementvalidationwebhook.metallb.io\": failed to call webhook: Post \"[https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\](https://webhook-service.metallb-system.svc/validate-metallb-io-v1beta1-l2advertisement?timeout=10s\)": EOF"], "stdout": "ipaddresspool.metallb.io/first-pool created", "stdout_lines": ["ipaddresspool.metallb.io/first-pool created"]}
ok: [control1] => (item=control2)
ok: [control1] => (item=control3)
@sleiner seems like there's some cache issue with molecule when cleaning up? https://github.com/techno-tim/k3s-ansible/runs/8194471058?check_suite_focus=true#step:7:225
@timothystewart6 The problem originates here:
fatal: [control3]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: Connection timed out during banner exchange", "unreachable": true}
As far as I see, the CI runner performance is quite weak when running 5 VMs at the same time. I have experimented with a few timeout tweaks (in a merge request based on this one) and the results look quite promising (at least compared to the current level of flakyness). You can take a look at the diff here.
I could:
What do you prefer?
@timothystewart6 I managed to make the flakiness mitigations completely independent from molecule.yml
(and thus independent from this change), so I opened #70 - the merge order is irrelevant 😊
Sorry, conflicts now from the latest merge. I started reviewing and it looks good so far. Will merge after conflicts are solved! Thank you for this, this is huge!
Thank you for this! This is huge!
Proposed Changes
To correctly escape IPv6 addresses when ports are used, they must be wrapped in square brackets. This patch adds support for that, using Ansible's ipwrap filter.
Checklist
site.yml
playbookreset.yml
playbook