mkubecek / vmware-host-modules

Patches needed to build VMware (Player and Workstation) host modules against recent kernels
GNU General Public License v2.0
2.15k stars 344 forks source link

vmnet-natd RTM_NEWADDR dropping connection #129

Open rakotomandimby opened 2 years ago

rakotomandimby commented 2 years ago

I face the problem described here: https://www.nikhef.nl/~janjust/vmnet/

vmnet-natd: RTM_NEWADDR: index:2, addr:192.168.1.72
kernel: userif-3: sent link down event.
kernel: userif-3: sent link up event.

This breaks the network connectivity in the VMs

Suggestion, might be good, might be bad, but I applied it:

-- vmnet-only/userif.c  2017-12-21 17:02:28.555820933 +0100
  +++ vmnet-only.jjk/userif.c   2017-12-15 13:22:13.257724953 +0100
  @@ -973,6 +973,9 @@
      userIf = (VNetUserIF *)port->jack.private;
      hubJack = port->jack.peer;

  +   /* never send link down events */
  +   if (!linkUp) return 0;
  +
      if (port->jack.state == FALSE || hubJack == NULL) {
         return -EINVAL;
      }
mkubecek commented 2 years ago

This has been already suggested earlier but if I understand correctly, it's a dirty - and functionally incorrect - hack to work around a problem in a different part of the VMware software. Am I wrong?

rakotomandimby commented 2 years ago

@mkubecek , I have not the skill to assert, but after applying it and work with it several days, I dont have the described problem. May be some problems are raised by this patch

NathanaelA commented 1 year ago

Just joining in here. It might be dirty, but when you are at a hotel that renews DHCP every 5 minutes, you can guess how usable the internet connection in the VM's are. Every connection/download is totally aborted every 5 minutes. Serious PITA.

I myself blogged about this back in 2019 (here: https://fluentreports.com/blog/?p=717) , and this bug is still standing and I still get people thanking me for re-posting the solution that Jan discovered in the original blog (that is gone linked above).

As a developer I think the better way would be to actually start a timer here, if after 5-10 seconds a link up event hasn't occur, then you actually send the link down event. But, I think all of us would be VERY happy if you could control this via setting so I didn't have to re-patch every update. I have never seen a point where the "Link-Down" event was actually helpful...

hmakmur commented 12 months ago

is there other fix to this problem?
My new install of VMPlayer 17.0.2 build 211581411 also have the same issue on Ubuntu 22.04 with kernel 5.15.0-76. My DHCP server renews every hour but my host is disconnecting every 5-10 seconds. It does not look like DHCP issue.

JosefVohnout commented 11 months ago

I have been using this patch for many months and it works great, only you need to reapply the patch after each vmware update. In version 17.0.2 you need to make a change on line 1029. It is sad that VMware does not include the patch in the standard release. Anyway, great work. Thank you

munix9 commented 11 months ago

I need this configurable, so:

diff -ruNp a/vmnet-only/driver.c b/vmnet-only/driver.c
--- a/vmnet-only/driver.c
+++ b/vmnet-only/driver.c
@@ -155,6 +155,10 @@ module_param(vnet_max_qlen, uint, 0);
 MODULE_PARM_DESC(vnet_max_qlen, "Maximum queue length of the vmnet, default is"
                  " 1024, maximum is 1024");

+bool send_link_down_events = true;
+module_param(send_link_down_events, bool, 0644);
+MODULE_PARM_DESC(send_link_down_events, "Send link down events, default is 1 (on)");
+
 /*
  *----------------------------------------------------------------------
  *
diff -ruNp a/vmnet-only/userif.c b/vmnet-only/userif.c
--- a/vmnet-only/userif.c
+++ b/vmnet-only/userif.c
@@ -76,6 +76,7 @@ static void VNetUserIfUnsetupNotify(VNet
 static int  VNetUserIfSetupNotify(VNetUserIF *userIf, VNet_Notify *vn);
 static int  VNetUserIfSetUplinkState(VNetPort *port, uint8 linkUp);
 extern unsigned int  vnet_max_qlen;
+extern bool send_link_down_events;

 #if COMPAT_LINUX_VERSION_CHECK_LT(3, 2, 0)
 #   define skb_frag_page(frag) (frag)->page
@@ -1025,6 +1026,12 @@ VNetUserIfSetUplinkState(VNetPort *port,
    userIf = (VNetUserIF *)port->jack.private;
    hubJack = port->jack.peer;

+   if (!send_link_down_events && !linkUp) {
+      LOG(0, (KERN_NOTICE "userif-%d: link down event not sent.\n",
+              userIf->port.id));
+      return 0;
+   }
+
    if (port->jack.state == FALSE || hubJack == NULL) {
       return -EINVAL;
    }
$ sudo modinfo vmnet
...
parm: vnet_max_qlen:Maximum queue length of the vmnet, default is 1024, maximum is 1024 (uint)
parm: send_link_down_events:Send link down events, default is 1 (on) (bool)

Default is on, with the following line e.g. in a file /etc/modprobe.d/vmware.conf it is switched off at boot time:

options vmnet send_link_down_events=0

It should also work in a running system:

$ sudo cat /sys/module/vmnet/parameters/send_link_down_events
Y

$ sudo echo 0 > /sys/module/vmnet/parameters/send_link_down_events

or

$ sudo echo 1 > /sys/module/vmnet/parameters/send_link_down_events

Search/watch the log for kernel: userif or sent link up event or link down event not sent.

This is a proof of concept, tested on openSUSE Tumbleweed with kernel 6.4.9 and workstation-17.0.2 branch.

thedroidkid commented 7 months ago

I'm also having this issue. It seems like this as been going on for some time. Is there any official fix being planned for implementation? Is there somewhere else we should post this?