mholt / caddy-l4

Layer 4 (TCP/UDP) app for Caddy
Apache License 2.0
980 stars 75 forks source link

Proxy handler goroutines for UDP never terminate #140

Open jtackaberry opened 1 year ago

jtackaberry commented 1 year ago

While investigating possible performance improvements mentioned in #132, I've run into what might be a design problem with UDP proxying.

Let's walk through what happens for UDP servers:

  1. First, in layer4/server.go, Server.servePacket() runs and the loop begins, waiting for a packet from the socket, blocking in ReadFrom()
  2. Client sends a datagram and ReadFrom() returns
  3. A new goroutine is spun up for that packet, a new packetConn is created, modeling a "connection" for the 4-tuple of (srcip, srcport, dstip, dstport), and that packetConn is passed to Server.handle().
    • But of course this isn't technically a connection, it's just a datagram, but we use the term UDP connection to refer to that 4-tuple, combined with some idle timeout to imply "connection" termination. I'll stop using the scare quotes around "connection" now, although therein lies the problem. :)
  4. Server.handle() passes the connection to Handler.Handle() over in modules/l4proxy/proxy.go
  5. Handler.Handle() picks an upstream, dials all peers from that upstream, and passes both the client connection (downstream) and all upstream connections to Handler.proxy()
  6. Handler.proxy() creates: a. 1 goroutine to tee data from the downstream to all upstreams, and b. N goroutines, one for each upstream peer which uses io.Copy() to send data from that upstream back to the downstream client
  7. The tee reader goroutine (6a) terminates, but the goroutine(s) from 6b never terminate. This is because io.Copy() assumes a stream, and this works fine for TCP where there is an analogue to EOF, but not for UDP where there isn't, and so io.Copy() blocks forever. Handler.proxy() waits for all goroutines from step 6 to terminate, but because 6b doesn't, Handler.proxy() doesn't return
  8. Each subsequent packet from the client goes through this entire flow again, even if the connection 4-tuple is the same as before, creating the packet handler goroutine in step 3, and separate goroutines for each upstream peer, none of which ever terminate.

So for each packet in a UDP stream we accumulate at least 2 new goroutines which never terminate. (It's more than 2 goroutines if there's more than one peer defined in the upstream.)

In terms of how I think this should work, a long-running goroutine per UDP connection (by which I mean the 4-tuple) makes sense. But the server loop should set this up once per client IP/port and dispatch subsequent packets to this long-running per-connection goroutine via a channel, where the UDP connection goroutine subsequently terminates after some (perhaps configurable) idle timeout. AFAICT io.Copy() would need to be avoided in any case.

I'll noodle with this a bit more and try to prototype a solution, but figured I shouldn't go too far down the rabbit hole before seeing if you had any obvious thoughts or guidance. Otherwise I'll take a stab at it and we can hash it out in a PR.

jtackaberry commented 1 year ago

Thinking about this a little more, I feel like the packetConn abstraction can do a lot of work for us here. Certainly it's how the server loop could get new packets into the mix without meddling with the proxy logic, and now I wonder if it could also provide the appropriate behavior for io.Copy() to work. I'll experiment with this over the weekend.

mholt commented 1 year ago

That makes a lot of sense. (You can tell I never used the proxy with UDP myself.)

I'd LOVE to see a packetConn that could deal with this properly! Excited to look over your experiment when you have a chance :smiley:

jtackaberry commented 1 year ago

I'd LOVE to see a packetConn that could deal with this properly! Excited to look over your experiment when you have a chance

I opened PR #141 to demonstrate the idea and start a discussion about not-yet-solved problems.