yggdrasil-network / yggstack

Yggdrasil + Netstack (instead of TUN)
Other
27 stars 8 forks source link

Yggstack stops working while socks transfer gets manually/intentionally interrupted #4

Closed bjtftw closed 2 months ago

bjtftw commented 5 months ago

Hello!

The story is short, while you interrupt file transfer using yggstack socks, the yggstack process stops responding.

Here's how to reproduce this:

  1. yggstack-linux-amd64 -genconf > yggstack-config.conf
  2. add some Peer to yggstack.conf
  3. run yggstack-linux-amd64 -useconffile yggstack-config.conf -socks 127.0.0.1:1080 -loglevel trace
  4. use yggstack socks to download something from Yggdrasil network, for example: curl --socks5 127.0.0.1:1080 http://[21d:73cc:25d9:1fe:5a4a:47df:eb39:7f]/m/debian/pool/main/l/linux/linux-headers-6.6.9-common_6.6.9-1_all.deb -o /dev/null
  5. now while downloading this file interrupt download by CTRL+C
  6. yggstack reports on stdout: server: readfrom tcp 127.0.0.1:1080->127.0.0.1:41500: write tcp 127.0.0.1:1080->127.0.0.1:41500: write: broken pipe
  7. yggstack hangs, does not respond for pings, socks does not work, etc, it needs to be restarted to work again

What's interesting, if you allow file from 4. to be downloaded completely nothing wrong will happen, yggstack will work fine so only interrupted file transfer is a problem.

basilgello commented 5 months ago

Is this observed with the trunk build available on github now? There was a bug caused by single thread spawned on connection, but I fixed it several months ago: https://github.com/yggdrasil-network/yggstack/commit/c09e2fb534033c0274ec6ef65d3182983b2f229c

bjtftw commented 5 months ago

Is this observed with the trunk build available on github now?

Yes, I've downloaded yggstack bin today from Releases page of this github repo. Have downloaded now again and tested one more time and the same result, yggstack stops responding when file transfer over socks is interrupted.

I'm on Ubuntu 20.04 if that matters.

basilgello commented 5 months ago

I can not reproduce the issue on my Termux and Debian boxes. I can try running 20.04 in container.

bjtftw commented 5 months ago

I can not reproduce the issue on my Termux and Debian boxes. I can try running 20.04 in container.

I just tested this on Ubuntu Mate 24.04 daily build live ISO (haven't installed on hdd or modify it in any way) so it is almost the same thing as final 24.04LTS ISO will be within a few weeks and issue is still there so definitely nothing related to 20.04.

Dang, I'm on metered connection so don't want to waste data limit for downloading other distro ISO just to check that but maybe tomorrow I will try to make minimal installation of Debian.

I've also compiled yggstack from git now and yes, the issue is here, so it's not related to binary Release also.

As I reported in the beginning the problem is only when you interrupt file transfer, if you let it go to the end of file there will be no any problem with yggstack connectivity, so have you interrupted download process as in 5. in my first post?

basilgello commented 5 months ago

I am not forcing you to waste your metered limits!

But I did the same thing, stopping curl via Ctrl-C and starting one again.

bjtftw commented 5 months ago

I've installed Debian SID and the problem is here also, the interesting thing is that here it's not always showing with only one curl download interruption, so can you check this:

start download with curl and interrupt it as you've done but then start download again and interrupt it also, within 2-5 rounds of this you will hang yggstack, even on Ubuntu I've seen (only once) such a behavior that first transfer interrupting didn't hang yggstack but 9/10 tests it hanged within first round and on Debian from unknown for me reason it took like 5 rounds but then, when I killed a yggstack process and started tests again it started hanging with one shot as in Ubuntu, so there is definitely something happening,

Please check this, and if that won't kill your yggstack then I'm going INSANE! ;) Just kidding, it's not so important to me to be resolved, this bug shows only in some untypical condition (manual transfer interruption) so I can live with it as I don't use -socks functionality and discovered this by accident so I'm not going to spend more time on this if you or maybe someone else reading this comment cannot confirm this within few days.

basilgello commented 5 months ago

Gonna check :) In the meantime, once you make the process hang, kill it with SIGQUIT so it spews Golang stacktrace

bjtftw commented 5 months ago

Trace (after hang) in attachment dump.txt

basilgello commented 5 months ago

Reproduced this one, too! Looks like SOCKS proxy side either refuses to grant connection or fails dialing downstream Ygg address

basilgello commented 2 months ago

The issue is that deep inside ironwood, the termination of connection does not destroy the queue so the combination of destination Ygg IP and port used by new connection makes ironwood think it is still an 'old' one.

goroutine 28835 gp=0xc000197c00 m=nil [chan receive]:
runtime.gopark(0xc0004995c0?, 0xc0003a3ec0?, 0xe5?, 0xf8?, 0xe070a0?)                                                                                          
        /usr/lib/go-1.22/src/runtime/proc.go:402 +0xce fp=0xc0003a3e80 sp=0xc0003a3e60 pc=0x43e44e
runtime.chanrecv(0xc00041ea20, 0x0, 0x1)
        /usr/lib/go-1.22/src/runtime/chan.go:583 +0x3bf fp=0xc0003a3ef8 sp=0xc0003a3e80 pc=0x408b7f                                                            
runtime.chanrecv1(0xe07060?, 0x48?)
        /usr/lib/go-1.22/src/runtime/chan.go:442 +0x12 fp=0xc0003a3f20 sp=0xc0003a3ef8 pc=0x408792
github.com/Arceliar/phony.Block({0xab18e8, 0xc000154500}, 0xc00074a3c0)                                                                                        
        /root/go/pkg/mod/github.com/!arceliar/phony@v0.0.0-20220903101357-530938a4b13d/actor.go:89 +0xd0 fp=0xc0003a3f50 sp=0xc0003a3f20 pc=0x5a8db0
github.com/Arceliar/ironwood/encrypted.(*sessionManager).writeTo(...)
        /root/go/pkg/mod/github.com/!arceliar/ironwood@v0.0.0-20240115190409-ddd1fa67c018/encrypted/session.go:144                                             
github.com/Arceliar/ironwood/encrypted.(*PacketConn).WriteTo(0xc000154480, {0xc0002a0000, 0x51, 0xc000101808?}, {0xaac640, 0xc0003a4100})
        /root/go/pkg/mod/github.com/!arceliar/ironwood@v0.0.0-20240115190409-ddd1fa67c018/encrypted/packetconn.go:70 +0x228 fp=0xc0003a3ff0 sp=0xc0003a3f50 pc=
0x6989c8
github.com/yggdrasil-network/yggdrasil-go/src/core.(*Core).WriteTo(0xc000178000, {0xc000222500, 0x50, 0x28?}, {0xaac640, 0xc0003a4100})                        
        /root/go/pkg/mod/github.com/yggdrasil-network/yggdrasil-go@v0.5.5/src/core/core.go:218 +0x178 fp=0xc0003a40a0 sp=0xc0003a3ff0 pc=0x7b43f8
github.com/yggdrasil-network/yggdrasil-go/src/ipv6rwc.(*keyStore).sendToAddress(0xc000114a80, {0x2, 0x1d, 0x73, 0xcc, 0x25, 0xd9, 0x1, 0xfe, 0x5a, ...}, ...)
        /root/go/pkg/mod/github.com/yggdrasil-network/yggdrasil-go@v0.5.5/src/ipv6rwc/ipv6rwc.go:82 +0x13e fp=0xc0003a4130 sp=0xc0003a40a0 pc=0x892f7e         
github.com/yggdrasil-network/yggdrasil-go/src/ipv6rwc.(*keyStore).writePC(0xc000114a80, {0xc000222500, 0x50, 0x500})
        /root/go/pkg/mod/github.com/yggdrasil-network/yggdrasil-go@v0.5.5/src/ipv6rwc/ipv6rwc.go:304 +0x47e fp=0xc0003a4200 sp=0xc0003a4130 pc=0x894afe
github.com/yggdrasil-network/yggdrasil-go/src/ipv6rwc.(*ReadWriteCloser).Write(...)
        /root/go/pkg/mod/github.com/yggdrasil-network/yggdrasil-go@v0.5.5/src/ipv6rwc/ipv6rwc.go:361
github.com/yggdrasil-network/yggstack/src/netstack.(*YggdrasilNIC).WritePackets(0xc00011e7d0, {{0xc000112288, 0x1, 0x1}})
        /tmp/yggstack/src/netstack/yggdrasil.go:107 +0xca fp=0xc0003a4288 sp=0xc0003a4200 pc=0x89634a
gvisor.dev/gvisor/pkg/tcpip/stack.(*delegatingQueueingDiscipline).WritePacket(0xc00010ea60, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/stack/nic.go:146 +0xa2 fp=0xc0003a42f0 sp=0xc0003a4288 pc=0x811402
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).writeRawPacket(0xc000224008, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/stack/nic.go:399 +0x3a fp=0xc0003a4310 sp=0xc0003a42f0 pc=0x81283a
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).writePacket(0xc000224008, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/stack/nic.go:386 +0x37 fp=0xc0003a4330 sp=0xc0003a4310 pc=0x812757
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).WritePacket(0xc000224008, 0xc0005ba900, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/stack/nic.go:347 +0x14e fp=0xc0003a4438 sp=0xc0003a4330 pc=0x81258e
gvisor.dev/gvisor/pkg/tcpip/network/ipv6.(*endpoint).writePacket(0xc000230008, 0xc0005ba900, 0xc000741040, 0x6, 0x0?)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/network/ipv6/ipv6.go:875 +0x2ac fp=0xc0003a46a0 sp=0xc0003a4438 pc=0x87
730c
gvisor.dev/gvisor/pkg/tcpip/network/ipv6.(*endpoint).WritePacket(0xc000230008, 0xc0005ba900, {0x86dd?, 0x0?, 0x0?}, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/network/ipv6/ipv6.go:826 +0x28d fp=0xc0003a47c0 sp=0xc0003a46a0 pc=0x87
6f8d
gvisor.dev/gvisor/pkg/tcpip/stack.(*Route).WritePacket(0xc0005ba900, {0x542d0002?, 0x84?, 0x2a?}, 0xc000741040)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/stack/route.go:500 +0x79 fp=0xc0003a47f8 sp=0xc0003a47c0 pc=0x81b1f9
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.sendTCP(0xc0005ba900, {{0x87cf, {{0x2, 0x0, 0x2d, 0x54, 0x84, 0x2a, 0x68, 0xc2, ...}, ...}, ...}, ...}, ...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/connect.go:938 +0x229 fp=0xc0003a68c0 sp=0xc0003a47f8 pc=
0x83c669
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*endpoint).sendTCP(0xc00025ee08, 0x2474473f?, {{0x87cf, {{0x2, 0x0, 0x2d, 0x54, 0x84, 0x2a, 0x68, ...}, ...}, ...}, 
...}, ...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/connect.go:836 +0xde fp=0xc0003a6978 sp=0xc0003a68c0 pc=0
x83b8be
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*endpoint).sendSynTCP(0xc00025ee08, 0xc0005ba900, {{0x87cf, {{0x2, 0x0, 0x2d, 0x54, 0x84, 0x2a, 0x68, ...}, ...}, ..
.}, ...}, ...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/connect.go:826 +0x1e5 fp=0xc0003a6aa0 sp=0xc0003a6978 pc=
0x83b585
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*handshake).start(0xc000753050)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/connect.go:601 +0x2c9 fp=0xc0003a6be8 sp=0xc0003a6aa0 pc=
0x83a909
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*endpoint).connect(0xc00025ee08, {0x1, {{0x2, 0x1d, 0x73, 0xcc, 0x25, 0xd9, 0x1, 0xfe, ...}, ...}, ...}, ...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/endpoint.go:2471 +0x6b9 fp=0xc0003a7300 sp=0xc0003a6be8 p
c=0x848cd9
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*endpoint).Connect(0xc00025ee08, {0x1, {{0x2, 0x1d, 0x73, 0xcc, 0x25, 0xd9, 0x1, 0xfe, ...}, ...}, ...})
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/transport/tcp/endpoint.go:2217 +0xbf fp=0xc0003a78f0 sp=0xc0003a7300 pc
=0x84773f
gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.DialTCPWithBind({0xab1300, 0xe74a20}, 0xc000100808, {0x0, {{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...}, ...}, ...}, ...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/adapters/gonet/gonet.go:514 +0x37f fp=0xc0003a7ad0 sp=0xc0003a78f0 pc=0
x86451f
gvisor.dev/gvisor/pkg/tcpip/adapters/gonet.DialContextTCP(...)
        /root/go/pkg/mod/gvisor.dev/gvisor@v0.0.0-20240103195848-a9a6a6819b00/pkg/tcpip/adapters/gonet/gonet.go:541
github.com/yggdrasil-network/yggstack/src/netstack.(*YggdrasilNetstack).DialContext(0xc0001121a0, {0xab1300, 0xe74a20}, {0x9cebc8, 0x3}, {0xc0000454d0?, 0xc000
0454a0?})
        /tmp/yggstack/src/netstack/netstack.go:77 +0x253 fp=0xc0003a7c28 sp=0xc0003a7ad0 pc=0x8956d3
github.com/yggdrasil-network/yggstack/src/netstack.(*YggdrasilNetstack).DialContext-fm({0xab1300?, 0xe74a20?}, {0x9cebc8?, 0xc00065a3f0?}, {0xc0000454d0?, 0xc0
000960c0?})
        <autogenerated>:1 +0x49 fp=0xc0003a7c70 sp=0xc0003a7c28 pc=0x89ac49
github.com/things-go/go-socks5.(*Server).handleConnect(0xc00023a000, {0xab1300, 0xe74a20}, {0x7f76df4ca218, 0xc000112270}, 0xc000752fc0)
        /root/go/pkg/mod/github.com/things-go/go-socks5@v0.0.4/handle.go:119 +0x18d fp=0xc0003a7d90 sp=0xc0003a7c70 pc=0x59f58d
github.com/things-go/go-socks5.(*Server).handleRequest(0xc00023a000, {0x7f76df4ca218, 0xc000112270}, 0xc000752fc0)
        /root/go/pkg/mod/github.com/things-go/go-socks5@v0.0.4/handle.go:89 +0x1e5 fp=0xc0003a7e18 sp=0xc0003a7d90 pc=0x59f0c5
github.com/things-go/go-socks5.(*Server).ServeConn(0xc00023a000, {0xab4038, 0xc000112270})
        /root/go/pkg/mod/github.com/things-go/go-socks5@v0.0.4/server.go:170 +0x805 fp=0xc0003a7f78 sp=0xc0003a7e18 pc=0x5a3005
github.com/things-go/go-socks5.(*Server).Serve.func1()
        /root/go/pkg/mod/github.com/things-go/go-socks5@v0.0.4/server.go:106 +0x32 fp=0xc0003a7fe0 sp=0xc0003a7f78 pc=0x5a26f2
runtime.goexit({})
        /usr/lib/go-1.22/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0003a7fe8 sp=0xc0003a7fe0 pc=0x475ba1
created by github.com/things-go/go-socks5.(*Server).goFunc in goroutine 33
        /root/go/pkg/mod/github.com/things-go/go-socks5@v0.0.4/server.go:191 +0x3e

@Arceliar

Arceliar commented 2 months ago

The issue is that deep inside ironwood, the termination of connection does not destroy the queue so the combination of destination Ygg IP and port used by new connection makes ironwood think it is still an 'old' one.

It's definitely possible that the issue is in ironwood, but I doubt it's the queues FWIW. Those work with encrypted blobs of arbitrary data, the have no concept of a port or connection to track (and their whole job is to give us a place to drop traffic if things get backed up).

It looks like things hang when trying to hand a packet off to the session manager (which handles crypto state between the nodes), so something may be getting blocked there. I'll try to look into it further.

basilgello commented 2 months ago

Here is the containerized reproducer that can be run without Internet access: yggstack-issue4.tar.xz.txt

And the traces: traces.txt

basilgello commented 2 months ago

Another variation of containerized reproducer: it does stress-test the interruption of 3 parallel cURL connections after 15 seconds downloading 1GB file

yggstack-issue4-stresser.tar.xz.txt