zerotier / ZeroTierOne

A Smart Ethernet Switch for Earth
https://zerotier.com
Other
14.42k stars 1.68k forks source link

Windows coexistence problems with HyperV and/or OpenVPN #357

Closed jdrews closed 7 years ago

jdrews commented 8 years ago

Receiving the following message on a Windows 10 client with 1.1.12.

Error joining Network: Cannot connect to ZeroTier service

screenshot_071216_111300_pm

I confirmed the ZeroTier service is running. Is there a log file somewhere for further information?

If I close the client and then open it again I receive the following message after about 2 minutes:

Unable to connect to ZeroTier Service

screenshot_071216_111645_pm

Then stopping the ZeroTier service (via services.msc) results in

Windows could not stop the ZeroTier One service on Local Computer
Error 1053: The service did not respond to the start or control request in a timely fashion

screenshot_071216_111923_pm

The service then says it is stopped. Turning it on again takes a 1-2 seconds, but does not fix above issues.

adamierymenko commented 8 years ago

What version of Windows is this? 32 or 64 bit?

jdrews commented 8 years ago

64 bit.

Further information. I rebooted client host (windows) and it was able to join a zerotier network. Then I joined and left it twice. On the second leave, I experienced this behavior... Unfortunately it looks to be an intermittent issue...

adamierymenko commented 8 years ago

We'll try repeatedly adding and leaving networks on Windows 10 x64 and see if we can reproduce.

Is this host running other VPN, tunnel, or network virtualization software or VM software like HyperV? We've heard intermittent reports of issues when other things like that are installed but so far have not been able to reproduce.

jdrews commented 8 years ago

Good call on that. The Windows 10 host did have OpenVPN installed (client was not running, service was running though) and Hyper-V. I'll look into those when I get some time.

adamierymenko commented 8 years ago

This may be related to or a duplicate of #308

adamierymenko commented 8 years ago

Absolutely no problem on CLEAN Windows 10 after many leave/joins... now installing OpenVPN.

adamierymenko commented 8 years ago

Another question: does your account have administrator rights?

adamierymenko commented 8 years ago

Hmm... just tried creating a normal user. It asks for an administrator user's password when you launch the app but otherwise it works fine. Left and rejoined network several times with OpenVPN installed and it's fine. Now trying HyperV.

adamierymenko commented 8 years ago

Added Hyper-V and am still unable to reproduce this problem.

Can you post some information about your Hyper-V settings? Are you running any VMs? Do you have any virtual switches configured?

jdrews commented 8 years ago

The account does have administrator rights.

It does have a virtual switch configured on the primary ethernet interface as well as the wireless interface. Pretty standard settings for the virtual switches. No VMs were running when this issue occurred.

screenshot_071416_052628_pm

adamierymenko commented 8 years ago

Okay, going to try replicating a similar config and testing.

adamierymenko commented 7 years ago

I believe we have FINALLY been able to duplicate this on a bare metal Windows machine with HyperV. I think this is also a duplicate of #308 since that's part of the symptom.

Working on it!

adamierymenko commented 7 years ago

Reproduced with 1.1.14 -- almost certainly a driver issue related to coexistence with HyperV and possibly other things.

adamierymenko commented 7 years ago

We've narrowed it down-- when ZT starts on a bare metal HyperV host, a single core gets MAX'd out... but the CPU thrashing is happening in bridge.sys which is part of Windows. Not sure yet if it's a HyperV component or a Windows component but that's where the evil is happening.

My guess is that our virtual network port driver (which is a pretty thin fork of OpenVPN's open source tap-win32 NDIS6 version) is not responding to something or is responding in a way bridge.sys doesn't like and is triggering a bug in MS's code. Unfortunately we are going to have to eat it because MS is not going to fix this for us.

adamierymenko commented 7 years ago

Googling shows a lot of issues with bridge.sys. :(

glimberg commented 7 years ago

@jdrews Is your Hyper-V virtual switch bridged to a wireless network card?

jdrews commented 7 years ago

@glimberg Yes I had two Hyper-V virtual switches bridged to physical devices. One to my external ethernet device, and one to my wireless device. See screenshot earlier in this thread.

glimberg commented 7 years ago

OK. That mirrors what we're seeing here in the office. It only happens when bridged with a wireless network card.

We have a ticket open with Microsoft and their networking group is looking into the issue. Hopefully they can either issue a fix for Windows, or let us know what we can change in our driver to prevent the issue from happening.

glimberg commented 7 years ago

@jdrews Good news! With some feedback from Microsoft we got this morning, it looks like we have this issue fixed internally and should make it into the 1.2.0 release.

adamierymenko commented 7 years ago

https://www.youtube.com/watch?v=PHQLQ1Rc_Js

jdrews commented 7 years ago

Great news! Thanks!

gbraad commented 7 years ago

This issue is still happening and causes what seems Wireless to not receive DHCP when a Virtual Switch setup for Hyper-V exists.

adamierymenko commented 7 years ago

Is this still happening with the 1.2.0 preview beta release?

https://download.zerotier.com/RELEASES/1.1.17-pre1.2.0/dist/

This should be fixed in dev.

gbraad commented 7 years ago

OK, will check with the preview release...

gbraad commented 7 years ago

@adamierymenko took a while to download, but have been able to confirm that with this file it seems to work. The Wireless adapter acts as expected... I am pretty sure now, this has also been the problem on another system. What is the ETA for a release with this fix?

adamierymenko commented 7 years ago

We are in final testing and documentation for 1.2.0. We wanted it out a month or so ago, so we're heads down on it!

adamierymenko commented 7 years ago

1.2.0 contains a lot of improvements and new capabilities.

adamierymenko commented 7 years ago

It remains backward compatible with 1.1.14 (and all the way back to 1.0.1).

gbraad commented 7 years ago

will keep the current version installed, but looking forward to the 1.2.0 release

layer4down commented 6 years ago

Experienced this same issue with FortiClient VPN client (Error: Cannot connect to ZeroTier service)... uninstalled FortiClient software and ZeroTier, then reinstalled ZT client.. ZT immediately began working correctly as advertized (didn't even need to reboot). HTH!

Windows 10 Home v1709 Build 16299.431