Closed jayunit100 closed 3 years ago
discussed with @mauilion and @BenTheElder earlier on slack: counterpoint:
hostNetwork
)also
Since we want to be able to find flakes manually without relying on CI, this would enable more hypothesis driven testing that is locally runnable over time. So , likely more people would locally be able to find problematic k8s tests without relying on sporadic cloud latencys to create isolated, hard to reproduce data points.
I really love this idea Jay, let's discuss it further.
I think that is better to apply the traffic control "outside" of the cluster, applying the tc
commands on the external veth pair interface of the nodes. KIND should continue being independent of the CNI plugins, and this way we can offer a more "real" simulation.
The implementation seems "simple", there is lot of literature on how to find the external veth interface of a container, i.e https://github.com/cslev/find_veth_docker., then, once the cluster finish the installation, to avoid causing issues with the bootstrap, we list the external interfaces of the nodes belonging to the cluster and apply the corresponding tc commands.
sudo tc qdisc add dev veth95b1019d root netem delay 200ms
this will work for any provider, docker, podman, ..
For the API I suggest to keep its own block, since the traffic control allow to control another interesting parameters like bandwidth and traffic drops, something like:
networking:
netem:
delay: 100ms
bandwidth:
rate: 1M
burst: 25k
loss: 10%
/retitle "[Feature] KIND support for external network emulation: latency, bandwidth constraint, packet drops"
/assign @BenTheElder for approval of the feature request
Yayyyyyy
I've been playing with this and it can be done simply with a bash scrip, please take a look.
https://gist.github.com/aojea/603e88ba709aac874bd7611752261772
I think that is better to get feedback and then, if there is demand, include it in KIND,
thanks @aojea ! How would you suggest publishing and standardizing this script, if we dont do it in Kind? I guess we could put it on a personal github repo as a kind recipe of some sort (i have many of my own https://github.com/jayunit100/k8sprototypes/tree/master/kind ) maybe we could join forces or something.
,.. i think if it goes in kind as an option, we'll get alot of great opportunities for newcomers to help simulate CI on their laptops , so it would be a potential big boost to the test-infra folks long term if we build some community around it.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
Hello, I am planning to integrate kind with a network emulator, Mininet, in particular. Thanks for @aojea to implement traffic control. But I may need more than this: I would want to setup the network topology like in a datacenter (e.g. fattree topology), and run various network routing protocol.
I've got some basic understanding on how Kind implement, and I have two plans to do the implementation. I am wondering if you guys could help me to decide between the two.
The basic goal is to replace kind's user-bridged networking with Mininet (actually I use Containernet, where I've modified it to be able to connect running containers, by placing a veth into the running container namespace). Now I am working on the Kind side. Two plans are: 1) Not modifying Kind, let it finishes, and then delete the network interfaces, modify all the related Kubernete configuration (though I've not fully figured out which files I need to modify, any hints will be very appreciated). 2) Modify kind: (a) adding a few args to indicate setting up with Mininet (b) start container without any networking ("none") by setting arg at this line, (c) before setting up k8s, trigger Mininet to run and setup the network. Then run the kubeadm init and so on.
Any feedback would be very appreciated!
- Not modifying Kind, let it finishes, and then delete the network interfaces, modify all the related Kubernete configuration (though I've not fully figured out which files I need to modify, any hints will be very appreciated).
- Modify kind: (a) adding a few args to indicate setting up with Mininet (b) start container without any networking ("none") by setting arg at this line, (c) before setting up k8s, trigger Mininet to run and setup the network. Then run the kubeadm init and so on.
You have more details in this link about how to setup complex scenarios with KIND https://gist.github.com/aojea/00bca6390f5f67c0a30db6acacf3ea91
I suggest you to start with 1), start small and iterate, you can always move from 1) to 2) later ... I don't discard that as soon as you start to be more familiar with the environment and the problematic you'll have new options ;)
Thank you @aojea !! For 1), could you please provide some hints on which files I need to modify? I found things like this discussion, does this look complete? Also found here saying that the certificate is signed for the old IP address... Do I need to redo the cert generation? ...
You have a detailed description of the modifications needed here https://gist.github.com/aojea/00bca6390f5f67c0a30db6acacf3ea91#example-multiple-network-interfaces-and-multi-home-nodes
@aojea I followed the steps to modify the IP. I am struggling with one step: "When creating the cluster we must add the loopback IP address of the control plane to the certificate SAN (the apiserver binds to "all-interfaces" by default)". Could you please advise how to modify Kind's configuration to modify the certificate SAN (I can't quite find which func in Kind is related to the certificate SAN)? Thank you!
it has to be patched in kubeadm, in the kind config, replace my-loopback
with the apiserver loopback address
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
# add the loopback to apiServer cert SANS
kubeadmConfigPatchesJSON6902:
- group: kubeadm.k8s.io
kind: ClusterConfiguration
patch: |
- op: add
path: /apiServer/certSANs/-
value: my-loopback
I really don't know if is possible to modify it after the installation and how to do it
Thank you @aojea. The following somehow did not work for me. So I finally chose "plan 3", to just delete the eth0 on kind container, and put veth into the namespace assigning the same IP address. It looks work now :)
it has to be patched in kubeadm, in the kind config, replace
my-loopback
with the apiserver loopback addresskind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 # add the loopback to apiServer cert SANS kubeadmConfigPatchesJSON6902: - group: kubeadm.k8s.io kind: ClusterConfiguration patch: | - op: add path: /apiServer/certSANs/- value: my-loopback
I really don't know if is possible to modify it after the installation and how to do it
@fejta-bot: Closing this issue.
I will let @aojea decide if we try to wire this into kindnetd etc or leave it to external extension.
I demoed this in the kubecon 2021 EU and created a plugin using kind API Source code and presentation are available in https://github.com/aojea/kind-networking-plugins
If there is more traction we can consider to move the feature to the core, but I don't have the feeling this should be part of it right now
/close
@aojea: Closing this issue.
A typical kind cluster has pretty stable networking, locally:
Whereas a real world cluster (for example, a high performance VMC cluster running on EC2 hardware, has a much different performance profile....
It would be nice to be able to disrupt the network bandwidth and throughput on kind clusters so that they matched those of clouds . In especially congested clouds, you can even see iperf values that might be 10X less then this in peak times.... (dont have an example on hand, but if someone can run iperf in a GCE cluster with 20 parallel conformance tests running, i bet youll be able to see this).
What would you like to be added:
Why is this needed:
Kind is increasingly used to simulate realistic clusters.
stress
and so on are used https://kubernetes.slack.com/archives/CN0K3TE2C/p1597351797024800 commonly