Allow configurable docker network for kind cluster nodes

foolusion commented 5 years ago

It would be nice to be able to specify the network that the cluster uses.

BenTheElder commented 5 years ago

/kind feature /priority important-longterm /assign

BenTheElder commented 5 years ago

We definitely want this, I think we're going to put it in the networking config, and start having an automated default.

cc @aojea @neolit123

Strawman:

if the specified network exists, we use it without creating
if it doesn't exist, we create it, and label it as belonging to the cluster
on cluster delete, we list networks labeled with the cluster, and delete only those (so not just whatever the containers use, only if we labeled it)
we name this field / functionality somehow such that it is clear that this feature is docker specific, leaving room for podman etc. in the immediate future xref #154.

neolit123 commented 5 years ago

Strawman:

SGTM

do we need to have the config field? when a cluster is created we can auto-manage a network with the same name or prefixed similarly? e.g. kind-network-kind kind-network-mycluster

BenTheElder commented 5 years ago

I think we need a field with defaulting to the generated one, that way EG federation can make multiple clusters on the same network by actually setting the field

On Thu, May 2, 2019 at 5:13 PM Lubomir I. Ivanov notifications@github.com wrote:

Strawman:

SGTM

do we need to have the config field? when a cluster is created we can auto-manage a network with the same name or prefixed similarly? e.g. kind-network-kind kind-network-mycluster

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/kubernetes-sigs/kind/issues/273#issuecomment-488876015, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHADK6PWFVIVH5K5RQVFB3PTN7UBANCNFSM4GUPO6FQ .

neolit123 commented 5 years ago

ok, makes sense.

aojea commented 5 years ago

seems that docker has an option to populate the /etc/hosts file, that can be useful to get rid of the loopback address in the resolv.conf and keep the node name resolution

Managing /etc/hosts
Your container will have lines in /etc/hosts which define the hostname of the container itself as well as localhost and a few other common things. The --add-host flag can be used to add additional lines to /etc/hosts.

$ docker run -it --add-host db-static:86.75.30.9 ubuntu cat /etc/hosts
172.17.0.22     09d03f76bf2c
fe00::0         ip6-localnet
ff00::0         ip6-mcastprefix
ff02::1         ip6-allnodes
ff02::2         ip6-allrouters
127.0.0.1       localhost
::1             localhost ip6-localhost ip6-loopback
86.75.30.9      db-static

BenTheElder commented 5 years ago

the only problem with --add-host is we don't know the other nodes's IPs when we call docker run, it's a bit chicken and egg :^)

BenTheElder commented 5 years ago

This turned out to have a few more issues that we expected due to non-default docker networks having different behavior. This may slip to 0.5 as we're nearing the 0.4 release, but it's definitely something we want.

jayunit100 commented 5 years ago

Hi . ! can someone disambiguate use cases here, between this and the https://github.com/kubernetes-sigs/kind/issues/278 issue ?

aojea commented 5 years ago

@jayunit100 there are different things regarding networking, one is the CNI plugin used by the kubernetes cluster, kind installs its own CNI by default but you can disable it and install your preferred CNI plugin once kind finish creating the cluster. The other networking part is the one that docker provides, that's where kind spawn the nodes. Currently kind only supports docker, with its networking limitations. Using another bridge in docker has some consequences that break kinds because of different things.

jayunit100 commented 5 years ago

So docker0 is only being used for node IP addresses in the use case for this issue? Thanks for clarifying! Was confused :) . Curious what the use case is for not using docker0 at that level ... after all kind as an abstraction for testing k8s is sufficient as long as the k8s specific stuff isn’t impacted by Dockers impl as a hyper visor for virtual nodes, right?

jayunit100 commented 5 years ago

Mostly get it now.. maybe change the title of this issue to “use non docker0 interface for kubelet IPs” (although imprecise I think it gets the point across) so that is clear what we mean by cluster :):)... thanks again! The CNI feature for kind is definetly awesome, want to make sure people know that it works as is :). Ps for context am looking at using kind instead of my vagrant recipes for some aspects of some calico tests .

aojea commented 5 years ago

There are calico folks using kind for testing, as you can see in this slack conversation https://kubernetes.slack.com/archives/CEKK1KTN2/p1570036710217000 , maybe you can bring up this conversation in our slack channel

The main problem of using a custom bridge with docker is that it modifies the DNS behavior, using an embedded dns server https://docs.docker.com/v17.09/engine/userguide/networking/configure-dns/

zephinzer commented 4 years ago

this would be nice- currently facing an issue of not being able to resolve an internal image registry which is behind my org's vpn. what work is left regarding this? maybe i can help!

BenTheElder commented 4 years ago

kind uses a specific network now in HEAD (kind) as part of some unrelated work.

as it currently stands kind will not delete any networks, so you can just precreate the kind network with your desired settings.

we need to revisit how that works a bit though WRT IIPv6 in a follow up PR before moving forward.

BenTheElder commented 4 years ago

https://github.com/kubernetes-sigs/kind/pull/1538 will make it possible to do this.

you shouldn't actually need this in nearly all cases though, kind is switching to ensure and use a "kind" network with all of the features of a user defined network.

if you pre-create this network it will use it as you configured, it does not delete networks.

BenTheElder commented 4 years ago

most of the problem initially was just that user defined networks are a breaking change in docker vs the bridge, they have different DNS in ways that don't trivially work with kind.

we've fixed that and always use one now.

the remaining issues are that completely arbitrary networks can be ... very strange. for now we're provisioning our own under a fixed name unless it already exists.

this network is a standard bridge, with an IPv6 subnet picked out of ULA.

redbrick9 commented 1 year ago

Hi @BenTheElder, I created a bridge which is similar to kind, just have the different "IPAM", I also set the env variable KIND_EXPERIMENTAL_DOCKER_NETWORK with the bridge. When issuing "kind create cluster ..." and saw the following stdout on the screen.

WARNING: Overriding docker network due to KIND_EXPERIMENTAL_DOCKER_NETWORK
WARNING: Here be dragons! This is not supported currently.

After the kind cluster was created, it still uses "kind" network, I tried to identify why and didn't get any clues, did you know why? Thanks!

prabhakhar commented 1 year ago

@redbrick9 It works as specified.

export KIND_EXPERIMENTAL_DOCKER_NETWORK=wildlings
kind create cluster --config wildlings.yaml

A new network got created in docker.

🚀 ➜ docker network inspect wildlings | jq .[0].IPAM
{
  "Driver": "default",
  "Options": {},
  "Config": [
    {
      "Subnet": "172.19.0.0/16",
      "Gateway": "172.19.0.1"
    },
    {
      "Subnet": "fc00:c796:3cb7:e852::/64"
    }
  ]

boeboe commented 1 year ago

Any chance we get this functionality as a kind startup flag (--net) in the future?

BenTheElder commented 1 year ago

This is pretty bug prone and you can pre-create the kind network with your (unsupported, potentially broken) settings instead.

boeboe commented 1 year ago

@BenTheElder

I kindly (pun intended) disagree. I am using the --network and --subnet flags in minikube on a daily basis:

minikube start --help | grep net
    --network='':
        network to run minikube with. Now it is used by docker/podman and KVM drivers. If left empty, minikube will create a new network.
    --subnet='':
        Subnet to be used on kic cluster. If left empty, minikube will choose subnet address, beginning from 192.168.49.0. (docker and podman driver only)

This allows me to run my minikube based kubernetes clusters (plural) in any docker networks (plural) that I pre-configured, or even allows me to create a bridged docker network through minikube itself.

I am not asking here to start managing docker bridged networks, as the --subnet flag does for minikube, but being able to attach your kind cluster to a configurable (and assumed pre-existing and pre-configured) docker network is basic functionality that does not extend kind beyond its core responsibilities.

The main use cases for me to use minikube with configurable (and subnet separated, to avoid metallb conflicts) docker networks is to simulate kubernetes multi-cluster demo, developer and CI/CD environments. It would be awesome that I can also use kind for this purpose. Moving this experimental flag to a first class, yet optional, command line argument does not impact stability and increases usability and adoption reach.

BenTheElder commented 1 year ago

--subnet

subnets come from docker IPAM settings which are already user configurable OR you can create the kind network (or KIND_EXPERIMENTAL_NETWORK)

but being able to attach your kind cluster to a configurable (and assumed pre-existing and pre-configured) docker network is basic functionality that does not extend kind beyond its core responsibilities.

https://kind.sigs.k8s.io/docs/contributing/project-scope/ https://kind.sigs.k8s.io/docs/design/principles/#target-cri-functionality

Anyhow you can connect to additional networks with for node in $(kind get nodes); do docker network connect $node network-name; done.

To change the default network in an experimental, unsupported way you can use KIND_EXPERIMENTAL NETWORK.

The main use cases for me to use minikube with configurable (and subnet separated, to avoid metallb conflicts) docker networks is to simulate kubernetes multi-cluster demo, developer and CI/CD environments. It would be awesome that I can also use kind for this purpose.

There's demos of this sort of thing in the kubernetes project using KIND with the existing functionality https://github.com/kubernetes-sigs/mcs-api/blob/master/scripts/up.sh

Moving this experimental flag to a first class, yet optional, command line argument does not impact stability

This is not true. See for example https://github.com/kubernetes-sigs/kind/issues/2917

aojea commented 1 year ago

or people can create their own plugins https://github.com/aojea/kind-networking-plugins

boeboe commented 1 year ago

Regarding https://github.com/kubernetes-sigs/kind/issues/2917 ... I don't see how this is relevant.

The only reason people connect to a second network, is because they were forced to by the arbitrary choice of hard coding a bridged docker network kind in the first place. The only place I've seen multi-network use cases is for multi-interface things in 5G core spec (and CNIs like multus), which are all Service Provider use cases.

Anyway... the experimental flag works like a charm and covers my use case. It's a bit strange you refuse to make this a first class command line flag... minikube and k3d both support it out if the box... without a plugin system.

Hard coding choices like the name and choice of a docker network is bad software design, but I'll leave it there.

FWIW... hereby my attempt to create a single abstraction layer for my multi cluster needs, having support for minikube/k3s/kind, where kind is the only one going "experimental" https://github.com/boeboe/k8s-local/blob/main/k8s-local.sh#L202

BenTheElder commented 1 year ago

Regarding https://github.com/kubernetes-sigs/kind/issues/2917 ... I don't see how this is relevant.

This is an example of the challenging bugs that crop up due to users with custom networking that we're not supporting.

We simply can't prioritize that. Which is why the existing feature is clearly named "EXPERIMENTAL" and will stay that way for now.

The only reason people connect to a second network, is because they were forced to by the arbitrary choice of hard coding a bridged docker network kind in the first place. The only place I've seen multi-network use cases is for multi-interface things in 5G core spec (and CNIs like multus), which are all Service Provider use cases.

Frankly, this approach is not helpful and I'm disinclined to spend further energy here.

The design and implementation is not "arbitrary" just because you have not looked into the history and context behind it. Every single change is carefully considered and implemented with reason. This is rude and willfully ignorant. All commits and discussions are public.

KIND used the default docker bridge for the first year before we ran into serious limitations exploring proposed fixes for clusters surviving host reboots, which was NOT an originally intended functionality we even tested because KIND was created to test Kubernetes, NOT to test applications.

But there was high user demand anyhow and minikube hadn't adopted the kind image yet and k3d didn't exist, so we spent a lot of effort adapting to the demands for long lived application development clusters. In the process we settled on a substitute for the standard docker bridge network that closely mimics it with the minimum of changes and because we have to configure it somewhat it is under the predictable "kind" name for running test containers alongside it and otherwise behaving very closely to before this change.

Anyway... the experimental flag works like a charm and covers my use case. It's a bit strange you refuse to make this a first class command line flag... minikube and k3d both support it out if the box... without a plugin system.

minikube is a sister project in the same organization, it is explicitly not a goal to attempt to create 100% overlap between them.

KIND is a lightweight tool focused on developing bleeding edge Kubernetes with a secondary focus on usage for other functionality, which you can find more about in our contributing guide / docs: https://kind.sigs.k8s.io/docs/contributing/project-scope/

It is important to our existing users and use cases that the tool remain small and well maintained and keep up with the latest changes in the container ecosystem, Linux, and Kubernetes, which is where most of our energy goes, e.g. #3223.

Hard coding choices like the name and choice of a docker network is bad software design, but I'll leave it there.

Again, you haven't bothered to look at how we settled on the current approach and you're being rude.

FWIW... hereby my attempt to create a single abstraction layer for my multi cluster needs, having support for minikube/k3s/kind, where kind is the only one going "experimental" https://github.com/boeboe/k8s-local/blob/main/k8s-local.sh#L202

Again:

This looks like it really is not such a burden to use an environment variable instead of a flag and to accept that this feature is considered experimental precisely because there isn't a fleshed out host-multi-network design due to very low demand and nobody contributing a detailed proposal etc.
KIND is only expected to be compatible with minikube, k3s, microk8s etc in the sense that it provides standard conformant Kubernetes. Docker host networks are irrelevant as long as node to node networking meets expectations, which is our focus in maintaining the docker network integration.

BenTheElder commented 1 year ago

This issue is closed.

If anyone would like to propose a new feature with a considered design proposal: https://kind.sigs.k8s.io/docs/contributing/getting-started/

To be considered for addition it will also first need concrete use cases that cannot be handled with existing functionality.

AFAIK there aren't really any, e.g. https://github.com/joshuaspence/homelab/commit/72b903880fe2c9e40697035c54c812e93876d9a1 references this but that can entirely be accomplished on the standard network instead (will leave a comment), multi-cluster testing is also referenced but that works fine on a single bridge network.

A few seed questions for anyone that does choose to explore this:

What does the network lifecycle look like? Is it coupled to the cluster? If so, how will we avoid breaking existing users and tools? If not, how will we clean up all these networks (as opposed to today, where the user has opted into an experimental feature and will have to work around this limitation).
How flexible will configure these be if we don't just leave it up to the power-user to do externally as today? How do we reconcile the wildly different networking vs podman and in the future nerdctl and other tools? Will we only support CNI networks and their limitations?

kubernetes-sigs / kind

Allow configurable docker network for kind cluster nodes #273