yggdrasil-network / yggdrasil-go

An experiment in scalable routing as an encrypted IPv6 overlay network
https://yggdrasil-network.github.io
Other
3.46k stars 235 forks source link

Latest dev version crashes #456

Closed Vort closed 5 years ago

Vort commented 5 years ago

Previous versions was crashing at start. So if program was successfully started, there was no more crashes. yggdrasil-tapfix-0.3.5-0148-windows-amd64.exe crashes too at start. But also it crashes during its operation.

These crashes can be related to "service" mode of operation. I can't (yet) reproduce them with "console" mode.

OS: Windows 7 SP1 x64

neilalexander commented 5 years ago

Without being able to see the panics this will be hard to diagnose. I’ll see if I can reproduce this in a Windows VM, but can you let us know what you’re doing typically when it crashes or how frequent it is?

Vort commented 5 years ago

I leaved it in console mode running for the night and here is the panic:

panic: More data is available.

goroutine 7 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc0000afc50, 0x1, 0xc000194030)
        /home/circleci/project/src/tuntap/iface.go:122 +0xdd5
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).Start
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee
neilalexander commented 5 years ago

Out of curiosity what is your IfMTU option set to in your yggdrasil.conf?

Does this match the value shown in netsh interface ipv6 show interfaces for your TAP adapter?

Vort commented 5 years ago

I hope check with 0.3.5 will be fine:

IfMTU: 65535
Инд     Мет         MTU          Состояние               Имя
 14          30       65535  connected     TapAdapter
neilalexander commented 5 years ago

Hm okay, I have a theory about what's going on here - I think the MTU should be 65521 due to the Ethernet headers with it being in TAP mode. If you drop your IfMTU down to 65521, does the crash go away?

Vort commented 5 years ago

I have another crash. Right after start.

2019/07/16 12:53:33 netsh command: netsh interface ipv6 set subinterface interface= mtu=65521 store=active
2019/07/16 12:53:33 Interface name:
2019/07/16 12:53:33 Interface IPv6: 200:f001:9c61:be26:2e6e:cd55:a454:2ada/7
2019/07/16 12:53:33 Interface MTU: 65521
2019/07/16 12:53:33 netsh command: netsh interface ipv6 add address interface= addr=200:f001:9c61:be26:2e6e:cd55:a454:2ada/7 store=active
panic: The handle is invalid.

goroutine 33 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc0000b1c50, 0x1, 0xc000160030)
        /home/circleci/project/src/tuntap/iface.go:122 +0xdd5
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).Start
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee

Second start is fine. Will watch it now.

neilalexander commented 5 years ago

Looks like the problem in your first crash was the blank Interface name:. We've had some problems with the Windows support in the Water library not being very good, although I've tried to fix it up somewhat. That's something we should be able to catch though.

Let me know how you get on with the second start.

Vort commented 5 years ago

Here is the crash:

panic: A device attached to the system is not functioning.

goroutine 81 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc000097c50, 0x1, 0xc00000f470)
        /home/circleci/project/src/tuntap/iface.go:122 +0xdd5
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).Start
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee
neilalexander commented 5 years ago

Yikes, that sounds like a problem in the TAP driver itself.

Is this using the NDIS 5 driver or the NDIS 6 one?

Vort commented 5 years ago

Yikes, that sounds like a problem in the TAP driver itself.

0.3.5 was using the same driver.

Is this using the NDIS 5 driver or the NDIS 6 one?

tap-windows-9.9.2_3.exe

neilalexander commented 5 years ago

Normally we recommend the version you posted because it performs better but I have never come across that bug before.

Does the problem happen with the tap-windows-9.21.2.exe driver instead?

Vort commented 5 years ago

yggdrasil-develop-0.3.5-0153-windows-amd64.exe, tap-windows-9.21.2.exe, IfMTU: 65521:

panic: More data is available.

goroutine 59 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc000097c50, 0x1, 0xc00000f470)
        /home/circleci/project/src/tuntap/iface.go:122 +0xdd5
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).Start
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee
neilalexander commented 5 years ago

Hm, okay, I'll need to look into what's going on there. That seems like a very Windows-specific problem as haven't encountered that on other platforms.

Vort commented 5 years ago

Maybe this can help with reproducing: I use DC++ 0.868, which is connected to adc://[202:9877:2815:cd91:336:2a16:bfd9:7258]:1511. Upload of my Internet channel very often used at 100%. And average CPU (4 core) usage is 30%.

neilalexander commented 5 years ago

Can you please see if the More data is available error is fixed in the following develop build?

64-bit: https://1257-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0179-windows-amd64.exe 32-bit: https://1257-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0179-windows-i386.exe

Vort commented 5 years ago
2019/07/20 10:11:07 Build name: yggdrasil-develop
2019/07/20 10:11:07 Build version: 0.3.5-0179
2019/07/20 10:11:07 Starting up...
2019/07/20 10:11:07 Starting switch
2019/07/20 10:11:07 Starting router
2019/07/20 10:11:07 Startup complete
2019/07/20 10:11:07 An error occurred starting TUN/TAP: Failed to find the tap d
evice in registry with specified ComponentId 'tap0901' and InterfaceName 'TapAda
pter', TAP driver may be not installed or you may have specified an interface na
me that doesn't exist
Vort commented 5 years ago

Why Water tries to find adapter name inside tapDriverKey? In my system it is located in different place, here: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\{4D36E972-E325-11CE-BFC1-08002BE10318}\{ABD27FD9-25F3-42A6-9BB4-218287A7D365}\Connection tap1 tap2

neilalexander commented 5 years ago

That's unusual - someone's obviously changed that in the upstream songgao/water repository and it's ended up being merged back. In my original commit https://github.com/yggdrasil-network/water/commit/86871951e720e14d0fcdd1241ae206327f4764db#diff-6be27a31703495803c6129f15a873036R142 we look in the key that you specified.

It looks like commit https://github.com/songgao/water/commit/d689e71bf730f1eb762ef17c7b62967a75cb0948. I'll raise a PR to fix it upstream and I'll fix it in our own repository.

neilalexander commented 5 years ago

Can you please test the following develop build to see if interface selection is fixed:

64-bit: https://1261-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0181-windows-amd64.exe 32-bit: https://1261-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0181-windows-i386.exe

Vort commented 5 years ago

:(

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xc0000005 code=0x0 addr=0x18 pc=0x5e2416]

goroutine 1 [running]:
github.com/yggdrasil-network/water.(*Interface).Name(...)
        /go/pkg/mod/github.com/yggdrasil-network/water@v0.0.0-20190720101301-5db
94379a5eb/if.go:79
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).resetAdapter(
0xc00008fc50, 0x66b599, 0x7)
        /home/circleci/project/src/tuntap/tun_windows.go:61 +0x46
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).setup(0xc0000
8fc50, 0xc0000127a0, 0xa, 0x1, 0xc0000df710, 0x28, 0xfff1, 0x7dd9afb638da113a, 0
x6a77c23348bd90cd)
        /home/circleci/project/src/tuntap/tun_windows.go:37 +0x200
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).Start(0xc0000
8fc50, 0xc000006238, 0x0)
        /home/circleci/project/src/tuntap/tun.go:140 +0x3fc
main.main()
        /home/circleci/project/cmd/yggdrasil/main.go:224 +0xfc2
neilalexander commented 5 years ago

How about this one?

64-bit: https://1265-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0182-windows-amd64.exe 32-bit: https://1265-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0182-windows-i386.exe

Vort commented 5 years ago
panic: runtime error: index out of range

goroutine 43 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc000
095c50, 0x1, 0xc0000df710)
        /home/circleci/project/src/tuntap/iface.go:134 +0xdc5
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).St
art
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee

But now at least I can see interface name, address and MTU.

neilalexander commented 5 years ago

And this one?

64-bit: https://1269-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0183-windows-amd64.exe 32-bit: https://1269-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0183-windows-i386.exe

Vort commented 5 years ago

O_o

panic: Insufficient quota to complete the requested service.

goroutine 7 [running]:
github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).reader(0xc000
0b1c50, 0x1, 0xc00016e030)
        /home/circleci/project/src/tuntap/iface.go:122 +0xda6
created by github.com/yggdrasil-network/yggdrasil-go/src/tuntap.(*TunAdapter).St
art
        /home/circleci/project/src/tuntap/tun.go:160 +0x5ee
neilalexander commented 5 years ago

Jesus, Windows is trash. I have no idea what that error means and I can't find any obvious documentation about it either.

Vort commented 5 years ago
  1. Code was working before.
  2. There exists sequence of API calls, it can be repeated with the minimal test program.

So ways of fixing it exists. Just debugging of it inside Windows can help. Maybe corrupted parameter is passed to API call at some time.

neilalexander commented 5 years ago

Well at the moment all I can assume is that the problem is in Water, so I am looking into what's going on there and to see whether anything else has been changed upstream.

If I compile Yggdrasil v0.3.5 from master I get the same Insufficient quota to complete the requested service problem as in develop so it's nothing in the Yggdrasil codebase I guess.

neilalexander commented 5 years ago

OK, so there was a bug in the upstream Water library because of the weird way that I/O works on Windows. I've also improved the handling in the Yggdrasil codebase too which might have affected huge MTUs. Can you please test the following:

64-bit: https://1273-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0184-windows-amd64.exe 32-bit: https://1273-115685026-gh.circle-artifacts.com/0/yggdrasil-develop-0.3.5-0184-windows-i386.exe

Vort commented 5 years ago

It works now. I will say if something goes wrong.

Vort commented 5 years ago

I see these lines:

2019/07/20 18:34:35 conn=0xc000484070 TUN/TAP conn read error: session closed
2019/07/20 18:34:46 conn=0xc0004748c0 TUN/TAP conn read error: session closed

But don't know how it affects functionality of Yggdrasil. Maybe it is just informational.

neilalexander commented 5 years ago

Yes those are just informational and will be hidden before v0.3.6 release.

Vort commented 5 years ago

13 hours of uptime and no crash. This counts as fixed. Thanks.

neilalexander commented 5 years ago

Thanks for reporting!

Vort commented 5 years ago

I have captured this error with debugger for commit c9dc9507de56d08651ec85f15d6f5351af3cdc85. And looks like problem was not in driver, but in Yggdrasil code: https://github.com/yggdrasil-network/yggdrasil-go/blob/c9dc9507de56d08651ec85f15d6f5351af3cdc85/src/tuntap/iface.go#L113-L119 When debugger hit line 119 with ERROR_MORE_DATA, bs length was 85. That, of course, is not enough for packet receiving.

neilalexander commented 5 years ago

If bs is only length 85 when returned from tun.iface.Read(), that's because Water has only returned an 85-byte slice. We pass in a 65535-byte buffer as an input.

Vort commented 5 years ago

How can tun.iface.Read change bs if it is returning only n and err? n was 0 by the way. Note: c9dc9507de56d08651ec85f15d6f5351af3cdc85 uses original Read function.

neilalexander commented 5 years ago

It doesn't change the bs slice per-se, but we do use the returned length to trim the buffer down to the length n before sending to the session: https://github.com/yggdrasil-network/yggdrasil-go/blob/c9dc9507de56d08651ec85f15d6f5351af3cdc85/src/tuntap/iface.go#L276-L281