mholt / caddy-l4

Layer 4 (TCP/UDP) app for Caddy
Apache License 2.0
929 stars 71 forks source link

UDP gets blocked after any unsuccessful matching #247

Open vnxme opened 1 week ago

vnxme commented 1 week ago

I'm currently debugging an OpenVPN matcher, and I've noticed a problem with how UDP is handled by caddy-l4.

Symptoms:

Sample config:

{
    layer4 {
        udp/:53 {
            @test dns {
                allow apple.com. * *
                default_deny
            }
            route @test {
                proxy udp/1.1.1.1:53
            }
        }
    }
}

Replication:

Side problem:

vnxme commented 1 week ago

I think the issue described above can be traced to https://github.com/mholt/caddy-l4/commit/6a8be7c4b8acb0c531b6151c94a9cd80894acce1.

@jtackaberry Is there any chance you could look at it?

vnxme commented 5 days ago

If we replace https://github.com/mholt/caddy-l4/blob/master/layer4/server.go#L98 with

        var i int
        for {
            i++
            fmt.Printf("iteration %d\n", i)

we can see in console, that after the first unsuccessful match this loop becomes endless due to an i/o timeout error.

One more thing to note: UDP becomes inaccessible in about 3 seconds after the last unsuccessful match. This fact has 2 implications:

vnxme commented 5 days ago

Actually it turns out to be the matchingTimeout set by https://github.com/mholt/caddy-l4/blob/master/layer4/routes.go#L123 that breaks UDP.

As a workaround (and a proof that the timeout is at least related to the problem described above), we can put _ = conn.SetReadDeadline(time.Time{}) as a simple neutralisation command after https://github.com/mholt/caddy-l4/blob/master/layer4/server.go#L147, and it seems to effectively prevent UDP from dying after unsuccessful matching.

This read deadline was originally introduced by @ydylla (in https://github.com/mholt/caddy-l4/commit/f04916554cbe94a116aaef90368fd0ac99bce513), and @WeidiDeng was the last one who reworked RouteList.Compile (https://github.com/mholt/caddy-l4/commit/afa78d72257b949486b24fa6f0351381c786a4b3 and https://github.com/mholt/caddy-l4/commit/22a03931ae46d0decc068e8d1ffe83e6187ed812).

I believe there is a straight logic behind this timeout, but I don't think it was supposed to break UDP. So could you guys, please, take a look as well? Is it possible we can solve the problem by adjusting RouteList.Compile rather than Server.servePacket?

mholt commented 5 days ago

Thanks for reporting and discussing @vnxme ! @WeidiDeng has also reached out to me on Slack about this, and I suppose a patch will be coming in sooner or later :)

vnxme commented 5 days ago

@mholt We have discussed another UDP-related issue with him (how to make listener_wrappers concept work with QUIC inside Caddy mainline), which is more like a feature request. This bug is a different story.

mholt commented 5 days ago

Oh, I see. Nevermind me then :sweat_smile:

WeidiDeng commented 4 days ago

Matching timeout is set to 3 seconds by default here. Maybe it's too small for udp. As udp datagrams are much smaller and less frequent than tcp data stream.

The problem comes from two places, one is the small timeout, the other is that packet handler ignore timeout error as it's temporary.

There is also the idle timeout of 30 seconds for udp. I think it's reasonable to extend deadline within packet handling unless idle timeout fires and that association should be removed anyway and io.EOF is returned.

WeidiDeng commented 4 days ago

@vnxme Can you try 249?

vnxme commented 4 days ago

Maybe it's too small for udp.

I think it's reasonable to extend deadline within packet handling

Well, I'm not sure about that. I can confirm that extending the matching timeout set here to 5, 10 or 15 seconds just postpones the moment when UDP gets broken.

Can you try 249?

Thank you for proposing this fix. It works, however I wouldn't do that since it completely prevents any matcher or handler from setting a read deadline which might be of some use anyway.

I suggest we consider an alternative: add defer func() { _ = cx.Conn.SetReadDeadline(time.Time{}) }() after https://github.com/mholt/caddy-l4/blob/master/layer4/routes.go#L123. This way the read readline will reset whenever HandlerFunc returns, and it will be effective for both UDP and TCP (I think we don't need a read deadline anyway after we have checked all matchers and none of them has matched).

WeidiDeng commented 4 days ago

Thank you for proposing this fix. It works, however I wouldn't do that since it completely prevents any matcher or handler from setting a read deadline which might be of some use anyway.

Completely is a wrong word. It doesn't prevent a deadline for tcp sockets. If you really want to set a deadline, you should set it for a particular mapping instead of the global deadline as I mentioned in the todo.

If you set the deadline for the udp socket, those matched udp associations will be affected by the deadline as well. Say you match wireguard protocol data, and one of the address pairs look like it. When copying the data between and from, another new associations appear. The deadline will break the wireguard association causing it to rematch.

The reset won't help in this case, or if you want to deal with it during the copying part. But why would anyone expect a deadline error could occur if no deadline is set?

vnxme commented 4 days ago

Completely is a wrong word. It doesn't prevent a deadline for tcp sockets.

Agree, I mean setting a read deadline for UDP.

The reset won't help in this case

I've tested it, and it fixes this bug as well as the solution you proposed. The difference is that resetting the read deadline on return affects both TCP and UDP.

The read deadline (a.k.a matching timeout) corresponds to "Maximum time connections have to complete the matching phase (the first terminal handler is matched)". So my understanding is, when HandlerFunc returns, we don't need it, so it's safe to reset it.

WeidiDeng commented 4 days ago

I've tested it, and it fixes this bug as well as the solution you proposed. The difference is that resetting the read deadline on return affects both TCP and UDP.

Did you test that the sending a small udp packet in this case programmatically in this case? For dns matcher just send one byte at a time every second for 10 seconds? And see the cpu usage for the next 7 seconds assuming the default timeout. I mean breaking a valid dns message into parts.

The udp deadline won't be received by the wrapped association because it is handled here. There is no reason to set it and the server pass the packet to the associated handling.

WeidiDeng commented 4 days ago

If you really want to fix the deadline, set the deadline on the association not the underlying socket. The deadline error won't be passed to the handler. For clients that don't send enough data, it's always taking 30 seconds.

vnxme commented 4 days ago

OK, my intention is just to fix this annoying UDP bug and not to break anything else. If you say #249 is safe enough and we don't need a read deadline reset on HandlerFunc return, then I second your fix and hope it could be merged soon.

WeidiDeng commented 4 days ago

My solution still needs some works, but it should suffice in this case. The matching timeout is used to control how long the matching goroutine run in this case, by default it's 3 seconds. It is a whopping 30 seconds for udp though. I thought for a while how to fix it, but one step at a time.

ydylla commented 3 days ago

I have never used udp in layer4, but looking at the code a bit, I currently would go with @WeidiDeng's fix. Since it is closer to the udp code. Properly supporting the matching timeout for udp would likely require a bigger rewrite.
I am for example not sure why the ignoring of timeout errors here even exists. I can not find anything that sets a timeout for udp (the idle timeout is implemented via select & time.After), so before the addition of the matching timeout this branch was probably never executed?
And while @vnxme is right after compiledRoute.Handle returns it gives back control of the connection to the caller and thus should probably not leave a deadline set on it. Sadly there does not seam to be an API to restore a previously set deadline. If we always reset it we may also reset a deadline that was set by a future udp rewrite.