memorysafety / river

This repository is the home of the River reverse proxy application, based on the pingora library from Cloudflare.
https://www.memorysafety.org/initiative/reverse-proxy/
Apache License 2.0
1.72k stars 101 forks source link

Add support for KDL configuration #42

Closed jamesmunns closed 3 months ago

jamesmunns commented 3 months ago

Experimenting with KDL, as #39 will introduce quite a bit of nested configuration that may be difficult to manage in TOML.

jamesmunns commented 3 months ago

Here's my current state, that I think I'm happy with. Here's the test_config in kdl (38 lines, 1038 characters):

system {
    threads-per-service 8
}

services {
    Example1 {
        listeners {
            "0.0.0.0:8080"
            "0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key"
            "0.0.0.0:8443" cert-path="./assets/test.crt" key-path="./assets/test.key"
        }

        connectors {
            "91.107.223.4:443" tls-sni="onevariable.com"
        }

        path-control {
            upstream-request {
                filter kind="remove-header-key-regex" pattern=".*SECRET.*"
                filter kind="remove-header-key-regex" pattern=".*secret.*"
                filter kind="upsert-header" key="x-proxy-friend" value="river"
            }
            upstream-response {
                filter kind="remove-header-key-regex" pattern=".*ETag.*"
                filter kind="upsert-header" key="x-with-love-from" value="river"
            }
        }
    }

    Example2 {
        listeners {
            "0.0.0.0:8000"
        }
        connectors {
            "91.107.223.4:80"
        }
    }
}

And what it looks like in TOML (42 lines, 1226 characters):

[system]
threads-per-service = 8

[[basic-proxy]]
name = "Example1"
    [[basic-proxy.listeners]]
        [basic-proxy.listeners.source]
        kind = "Tcp"
            [basic-proxy.listeners.source.value]
            addr = "0.0.0.0:8080"

    [[basic-proxy.listeners]]
        [basic-proxy.listeners.source]
        kind = "Tcp"
        [basic-proxy.listeners.source.value]
        addr = "0.0.0.0:4443"
            [basic-proxy.listeners.source.value.tls]
            cert_path = "./assets/test.crt"
            key_path = "./assets/test.key"

    [basic-proxy.connector]
    proxy_addr = "91.107.223.4:443"
    tls_sni = "onevariable.com"

    [basic-proxy.path-control]
    upstream-request-filters = [
        { kind = "remove-header-key-regex", pattern = ".*(secret|SECRET).*" },
        { kind = "upsert-header", key = "x-proxy-friend", value = "river" },
    ]

    upstream-response-filters = [
        { kind = "remove-header-key-regex", pattern = ".*ETag.*" },
        { kind = "upsert-header", key = "x-with-love-from", value = "river" },
    ]

[[basic-proxy]]
name = "Example2"
listeners = [
    { source = { kind = "Tcp", value = { addr = "0.0.0.0:8000" } } }
]
connector = { proxy_addr = "91.107.223.4:80" }

It's not necessarily as short as I expected it to be, but I think there's a lot of lines that are trailing braces (}), but there's also a LOT less repeated verbosity for nested items.

jamesmunns commented 3 months ago

Interestingly, this brings our configuration syntax MUCH closer to something like NGINX:

http {
    upstream backend {
        server backend1.example.com;
        server backend2.example.com;
        server 192.0.0.1 backup;
    }

    server {
        location / {
            proxy_pass http://backend;
        }
    }
}

HOWEVER, it is challenging in KDL to express a pattern like server 192.0.0.1 backup - essentially KEYWORD KEYWORD KEYWORD, where it is much easier to express it in a way like server addr="192.0.0.1" option=backup or so.

I don't think this is a blocker, but it might end up being a little "uncanny valley" for former NGINX users.

bebehei commented 3 months ago

As a heavy nginx user, this looks much better. Visibly, there is more configuration data in the file than configuration markup

HOWEVER, it is challenging in KDL to express a pattern like server 192.0.0.1 backup - essentially KEYWORD KEYWORD KEYWORD, where it is much easier to express it in a way like server addr="192.0.0.1" option=backup or so.

This is a much smaller problem than you think. The server addr="1.2.3.4" might be redundant. You could just drop the server tag completely.

Stating some options explicitly is somewhat okay. I guess, being able to just throw backup in the upstream's server line requires a self-built configuration parser, I guess.


On the other hand: Is there something like configuration inheritance? Specifying a vHost will most likely still give all three listening sockets the same certificate. So, explicitly stating the same certificate for every endpoint somewhat redundant.

services {
    Example1 {
        listeners {
            "0.0.0.0:8080"
            "0.0.0.0:4443" cert-path="./assets/test.crt" key-path="./assets/test.key"
            "0.0.0.0:8443" cert-path="./assets/test.crt" key-path="./assets/test.key"
        }
    }
}

If KDL would not do this, you could also use YAML for this and offload the problem to the configuration language itself:

services:
  Example:
    listeners:
      "0.0.0.0:8443": &Example-default-certs
        tls: true
        crt: ./assets/test.crt
        key: ./assets/test.key
      "0.0.0.0:4443":
        <<: *Example-default-certs
jamesmunns commented 3 months ago

This is a much smaller problem than you think. The server addr="1.2.3.4" might be redundant. You could just drop the server tag completely.

That's fair! I don't think this is a blocker, just something I was noting.

On the other hand: Is there something like configuration inheritance?

Built in to KDL: not specifically. We could invent something, but for as long as we can get away with it, I'd prefer to keep the configuration file parsing as simple as possible, to avoid accidentally making it turing complete :)

We will likely also support JSON configuration in the future, and recommend that for anyone that needs to generate larger or dynamic configuration, or would like to build their own templating solution for configuration.

bebehei commented 3 months ago

We will likely also support JSON configuration in the future

On the one hand, this seems nice. But for configuration, I'd like to see for YAML. Main benefit: It allows comments and does not need another external (mostly self-written tool) to generate configuration.

YAML allows 90% of the use users to use the default configuration format in raw.

jamesmunns commented 3 months ago

@bebehei I'm not totally against YAML, but support is likely not planned in the near future. Previous discussions here:

The plan for now is to keep the core of the software flexible to allow us to support multiple configuration languages for now, as we focus on other core development tasks.

johnpyp commented 2 months ago

I'm a very big fan of this syntax using KDL! Looks great - far more readable than the current toml config.