kubernetes / minikube

Run Kubernetes locally
https://minikube.sigs.k8s.io/
Apache License 2.0
29.19k stars 4.87k forks source link

Minikube tries to delete the default libvirt network #18919

Closed nirs closed 3 months ago

nirs commented 4 months ago

What Happened?

When running minikube with --driver kvm2 --network default it will use the libvirt default network, but it does not create it. If fact, it will fail if the default network does not exist. When deleting a minikube profile, it tries the delete the default network it did not create.

Looking at verbose logs we see that minikube tries to delete the default network if the network is not used by any vms:

I0518 02:51:47.824566 1219676 out.go:177] * Deleting "minikube" in kvm2 ...
I0518 02:51:47.824590 1219676 main.go:141] libmachine: (minikube) Calling .Remove
I0518 02:51:47.824675 1219676 main.go:141] libmachine: (minikube) DBG | Removing machine...
I0518 02:51:47.834997 1219676 main.go:141] libmachine: (minikube) DBG | Trying to delete the networks (if possible)
I0518 02:51:47.845017 1219676 main.go:141] libmachine: (minikube) DBG | Checking if network default exists...
I0518 02:51:47.845706 1219676 main.go:141] libmachine: (minikube) DBG | Network default exists
I0518 02:51:47.845718 1219676 main.go:141] libmachine: (minikube) DBG | Trying to list all domains...
I0518 02:51:47.845824 1219676 main.go:141] libmachine: (minikube) DBG | Listed all domains: total of 4 domains
I0518 02:51:47.845834 1219676 main.go:141] libmachine: (minikube) DBG | Trying to get name of domain...
I0518 02:51:47.845839 1219676 main.go:141] libmachine: (minikube) DBG | Got domain name: fedora39-base
I0518 02:51:47.845842 1219676 main.go:141] libmachine: (minikube) DBG | Getting XML for domain fedora39-base...
I0518 02:51:47.846001 1219676 main.go:141] libmachine: (minikube) DBG | Got XML for domain fedora39-base
I0518 02:51:47.846311 1219676 main.go:141] libmachine: (minikube) DBG | Unmarshaled XML for domain fedora39-base: kvm.result{Name:"fedora39-base", Interfaces:[]kvm.iface{kvm.iface{Source:kvm.source{Network:"default"}}}}
I0518 02:51:47.846351 1219676 main.go:141] libmachine: (minikube) Deleting of networks failed: network still in use at least by domain 'fedora39-base',

This is very wrong - minikube does not own the libvirt default network, so it must not try to remove it.

This is 100% reproducible when no other vm on the system is using the default network, and randomly fails when deleting multiple profiles in parallel, since the check for used network is racy (time of check, time of use).

Looking at the relevant code, the intent is very clear that we do not want to delete the default network (d.Network), but only the minikube private network (d.PrivateNetwork). So it seems that the root cause is that the d.PrivateNetwork is set to "default" by mistake at some point.

238 func (d *Driver) deleteNetwork() error {
239     conn, err := getConnection(d.ConnectionURI)
240     if err != nil {
241         return errors.Wrap(err, "getting libvirt connection")
242     }
243     defer conn.Close()
244 
245     // network: default
246     // It is assumed that the OS manages this network
247 
248     // network: private
249     log.Debugf("Checking if network %s exists...", d.PrivateNetwork)                                                                                                                       
250     network, err := conn.LookupNetworkByName(d.PrivateNetwork)
251     if err != nil {
252         if lvErr(err).Code == libvirt.ERR_NO_NETWORK {
253             log.Warnf("Network %s does not exist. Skipping deletion", d.PrivateNetwork)
254             return nil
255         }
256         return errors.Wrapf(err, "failed looking up network %s", d.PrivateNetwork)
257     }
258     defer func() { _ = network.Free() }()
259     log.Debugf("Network %s exists", d.PrivateNetwork)
260     
261     err = d.checkDomains(conn)
262     if err != nil {
263         return err
264     }

Checking the domain, we have 2 interfaces created on the default network, instead of one interface on the default network, and one on the "minikube-net" private network:

    <interface type='network'>
      <mac address='52:54:00:16:16:55'/>
      <source network='default' portid='48a95c83-922c-45b1-974e-52feada03103' bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:76:58:ad'/>
      <source network='default' portid='aa7bbe43-62bc-466d-8da6-dcae3dbcf5c3' bridge='virbr0'/>
      <target dev='vnet1'/>
      <model type='virtio'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>

We can also see that minikube is using only the default network in the config:

$ grep KVMNetwork /data/tmp/.minikube/profiles/minikube/config.json 
    "KVMNetwork": "default",

Maybe this is the intended behavior - if you use --network defualt both interfaces are created on the specified network.

Based on this, I think we should skip deletion if the private network is "default".

Attach the log file

Logs in the description.

Operating System

Redhat/Fedora

Driver

KVM2

medyagh commented 4 months ago

Thanks @nirs for the PR and that sounds resonable, I am curious what are some practical usages of using the "default" network vs letting minikube create its own custom network ?

I am saying this because many years ago when we were sharing the network and not using dedicated network we were facing a lot of issues such as ip conflicts or stuck networks or clean ups... I just like to learn your specific needs if you dont mind sharing

nirs commented 4 months ago

Thanks @nirs for the PR and that sounds resonable, I am curious what are some practical usages of using the "default" network vs letting minikube create its own custom network ?

We create a DR setup with 3 clsuters (hub, dr1, dr2). The DR clusters run rook-ceph storage, and use the host network. This makes it easier to setup storage replication between the dr1 and dr2. With the storage replicated, when we create a workload on one cluster and enable DR protection, the ceph volume is replicated to the other cluster. Then we can simulate a disaster by suspending or destroying one of the clusters, and start the workload on the other cluster.

On a real setup, we allow connect the remote clusters using submariner. For the testing setup we simplify, we have enough trouble without submariner.

You can check this FOSDEM talk showing all this with virtual machine as workload: https://fosdem.org/2024/schedule/event/fosdem-2024-3256-instant-ramen-quick-and-easy-multi-cluster-kubernetes-development-on-your-laptop/

medyagh commented 4 months ago

Thanks @nirs for the PR and that sounds resonable, I am curious what are some practical usages of using the "default" network vs letting minikube create its own custom network ?

We create a DR setup with 3 clsuters (hub, dr1, dr2). The DR clusters run rook-ceph storage, and use the host network. This makes it easier to setup storage replication between the dr1 and dr2. With the storage replicated, when we create a workload on one cluster and enable DR protection, the ceph volume is replicated to the other cluster. Then we can simulate a disaster by suspending or destroying one of the clusters, and start the workload on the other cluster.

On a real setup, we allow connect the remote clusters using submariner. For the testing setup we simplify, we have enough trouble without submariner.

You can check this FOSDEM talk showing all this with virtual machine as workload: https://fosdem.org/2024/schedule/event/fosdem-2024-3256-instant-ramen-quick-and-easy-multi-cluster-kubernetes-development-on-your-laptop/

thank you for sharing, thats interesting, so in this case the other servers in the cluster already use the "Default" network and I assume you pass a flag to minikube to force it to use the default network as well, right?

I think as long as we make sure we have a good clean up story (for when we delete minikube) that should be good PR for minikube, and thanks for taking the time to contribute it