mholt / caddy-l4

Layer 4 (TCP/UDP) app for Caddy
Apache License 2.0
968 stars 74 forks source link

Incorrect Handling of SNI: No Certificate Available (Regression) #234

Closed elcore closed 2 months ago

elcore commented 2 months ago

After updating the Caddy Layer4 module to https://github.com/mholt/caddy-l4/commit/e491c44895fe3f11e24ad4d8c4f6a668144c0ef9, Caddy erroneously attempts to handle certificates for domains when it should instead proxy and forward connections based on SNI.

Logs:

{"level":"error","ts":1723754927.6938345,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35770","error":"no certificate available for 'ud-chat.signal.org'"}
{"level":"error","ts":1723754927.7430856,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35772","error":"no certificate available for 'chat.signal.org'"}
{"level":"error","ts":1723754927.9744139,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35788","error":"no certificate available for 'ud-chat.signal.org'"}
{"level":"error","ts":1723754928.1224709,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:35802","error":"no certificate available for 'chat.signal.org'"}
{"level":"error","ts":1723754929.6092706,"logger":"layer4","msg":"handling connection","remote":"127.0.0.1:60324","error":"no certificate available for 'ud-chat.signal.org'"}
...

Configuration:

{
  "apps": {
    "tls": {
      "certificates": {
        "automate": [
          "sub.example.com"
        ]
      }
    },
    "layer4": {
      "servers": {
        "signal_proxy": {
          "listen": [
            ":443"
          ],
          "routes": [
            {
              "handle": [
                {
                  "handler": "tls"
                }
              ]
            },
            {
              "match": [
                {
                  "tls": {
                    "sni": [
                      "chat.signal.org",
                      "ud-chat.signal.org",
                      "storage.signal.org",
                      "cdn.signal.org",
                      "cdn2.signal.org",
                      "cdn3.signal.org",
                      "api.directory.signal.org",
                      "cdsi.signal.org",
                      "sfu.voip.signal.org",
                      "svr2.signal.org",
                      "updates.signal.org",
                      "updates2.signal.org",
                      "backend1.svr3.signal.org",
                      "backend2.svr3.signal.org",
                      "backend3.svr3.signal.org"
                    ]
                  }
                }
              ],
              "handle": [
                {
                  "handler": "proxy",
                  "upstreams": [
                    {
                      "dial": [
                        "{l4.tls.server_name}:443"
                      ]
                    }
                  ]
                }
              ]
            }
          ]
        }
      }
    }
  }
}
  1. Reverting to https://github.com/mholt/caddy-l4/commit/94cd39994f7dcd4f5ea10a6581aa624db7e08195 fixes the issue
mohammed90 commented 2 months ago

What happens if you upgrade to afa78d72257b949486b24fa6f0351381c786a4b3? I'm confused by how commit https://github.com/mholt/caddy-l4/commit/94cd39994f7dcd4f5ea10a6581aa624db7e08195 causes it.

elcore commented 2 months ago

No, upgrading to https://github.com/mholt/caddy-l4/commit/afa78d72257b949486b24fa6f0351381c786a4b3 does not fix it, I cherry-picked the regression to https://github.com/mholt/caddy-l4/commit/e491c44895fe3f11e24ad4d8c4f6a668144c0ef9. https://github.com/mholt/caddy-l4/commit/94cd39994f7dcd4f5ea10a6581aa624db7e08195 is the last working commit (it did not cause it, that was https://github.com/mholt/caddy-l4/commit/e491c44895fe3f11e24ad4d8c4f6a668144c0ef9)

mohammed90 commented 2 months ago

The upstream address is being replaced aggressively too early here with a fresh replacer (not from context) that doesn't have the matched SNI, causing it to be empty because it's ReplaceAll. It used to being replaced at the dialPeer call which happens inside the Handle method of the proxy handler.

@vnxme, any chance you can take a look?

mohammed90 commented 2 months ago

@elcore, I have come up with a workaround. but it requires duplication. You can make a match/proxy-handle block for each upstream address.

elcore commented 2 months ago

Hmm, interesting 🤔. I think for now, pinning to the working commit is temporarily fine (after all it’s inside a Docker container), unless you’d say that not updating would be not wise.

mohammed90 commented 2 months ago

Keep the pin. I like that you have a valid use case and a simple reproducer 🙂

vnxme commented 2 months ago

@mohammed90 I'll look into the issue and propose a solution. @elcore, thank you for a valid use case example.

vnxme commented 2 months ago

@elcore First of all, you have no certificate available for 'ud-chat.signal.org'-like errors because you have a route with a single tls handler before everything else in your list of routes. I think you don't need this route at all if you would like to proxy https (and assuming you don't have valid certificates for these domain names). Alternatively, it should be placed after the route where you match the third-party domains and proxy them.

vnxme commented 2 months ago

with a fresh replacer (not from context) that doesn't have the matched SNI

@mohammed90 We've already discussed that, and I mentioned that the context is empty at provision. That's why there is no point in trying to extract the replacer from the context - it won't work, I've checked.

The upstream address is being replaced aggressively too early here

causing it to be empty because it's ReplaceAll.

But you are right stating that the upstream address gets empty in this issue because {l4.*} placeholders don't exist at provision. What do you think if we use ReplaceKnown there instead of ReplaceAll?

elcore commented 2 months ago

It’s a layered approach with TLS in TLS hence the above configuration. I currently cannot test your idea.

vnxme commented 2 months ago

a layered approach with TLS in TLS

This way you have to use subroute. Like this:

{
    layer4 {
        :443 {
            route {
                tls
                subroute {
                    @cf tls sni one.one.one.one
                    route @cf {
                        proxy 1.1.1.1:443
                    }
                }
            }
        }
    }
}
elcore commented 2 months ago

I will certainly try this ASAP. Nonetheless I am surprised that the previous config is now broken with the latest commit(s). Sidenote: This seems like a Caddyfile configuration, right?

mohammed90 commented 2 months ago

What do you think if we use ReplaceKnown there instead of ReplaceAll?

This would fix it.

This way you have to use subroute. Like this:

This approach is the workaround I suggested earlier, which requires duplication. Elcore's existing config takes advantage of the matched value being known at runtime without having to make a separate route/branch for every domain name.

vnxme commented 2 months ago

Nonetheless I am surprised that the previous config is now broken with the latest commit(s).

I'm surprised your config actually used to work because it's conceptually wrong (missing subroute, as mentioned above). But you are right that {l4.tls.server_name}:443 should be an acceptable dial address, and we will fix that shortly.

This seems like a Caddyfile configuration, right?

Exactly. It's a part of the latest commits as well.

Elcore's existing config takes advantage of the matched value being known at runtime without having to make a separate route/branch for every domain name.

Yes, I can see that. I suppose you haven't got my idea. The final working (after a fix, and given Elcore needs TLS-in-TLS) config will be as follows:

{
    layer4 {
        :443 {
            route {
                tls
                subroute {
                    @signal tls sni chat.signal.org ud-chat.signal.org storage.signal.org cdn.signal.org cdn2.signal.org cdn3.signal.org api.directory.signal.org cdsi.signal.org sfu.voip.signal.org svr2.signal.org updates.signal.org updates2.signal.org backend1.svr3.signal.org backend2.svr3.signal.org backend3.svr3.signal.org
                    route @signal {
                        proxy {l4.tls.server_name}:443
                    }
                }
            }
        }
    }
}
mohammed90 commented 2 months ago

I suppose you haven't got my idea

I get your idea, and saying the route is unnecessary.

WeidiDeng commented 2 months ago

@elcore Can you try xcaddy build --with github.com/mholt/caddy-l4=github.com/WeidiDeng/caddy-l4@prefetch-fix?

elcore commented 2 months ago

@WeidiDeng Your fix does not seem to work - Same issue as above @vnxme Your fix does work.