pomerium / pomerium

Pomerium is an identity and context-aware access proxy.
https://www.pomerium.com
Apache License 2.0
4.03k stars 285 forks source link

Support Regex based host/path rewrites #5081

Open 0anton opened 5 months ago

0anton commented 5 months ago

Is your feature request related to a problem? Please describe.

I'd like to create an automatic CI/CD-based app deployment for each opened PR using Pomerium and CloudRun.

The proxy shall forward requests like https://app-pr-123.internal.host to the CloudRun URLs https://app-pr-123-projecthash.run.app, where 123 is can be any number.

The number of such deployments is high and highly dynamic. I'm hesitating to introduce dynamic configuration management, as it may not worth it.

I cannot find the way to configure those rules given the current feature set of Pomerium.

Describe the solution you'd like

I'd like Pomerium to support dynamic target rewrite syntax. Best, if it is based on a generic Regex rewrite logic.

The feature shall allow user to specify to rewrite request URI https://host/path to a upstream URI https://host/path using Regex expression, without restricting which part is used to create capture groups and where it is consumed (in host or in the path).

For example, rewrite rule for the GitHub pull request preview stage, implemented as CloudRun could look like:

routes:
  - from: https://app-pr-(\d{1,5}).internal.host
    to: https://app-pr-\1-rkr95gieug-gs.run.app
    match_type: hostPathRegex

Describe alternatives you've considered

1. Dynamical reconfiguration

I can dynamically update proxy configuration. E.g. the creation of the PR would trigger the route creation.

This option is not best for me, because it will allow the CI/CD pipeline to change sensitive proxy configuration.

2. Host rewrite, sub-path mounting and some DNS engineering

I could combine two following tricks to achieve dynamic routing with the static configuration.

  1. I could (mis)-use the fact that Google Frontend relies on Host only to route the request to the CloudRun (SNI is ignored). I can configure the so called Private Service Access, which is a CNAME mapping from *.app.run to restricted.googleapis.com. This will give me a static destination for the Pomerium to.

This trick as limitation, as does only work for CloudRuns, which are attached to a controllable network (=private VPC), where we can manipulate DNS resolving and routing.

For normal public CloudRun I can remap *.app.run to e.g. storage.googleapis.com, but it is already a risky way: once Google decides to stop routing CloudRun traffic over its Google Frontend (e.g. by introducing multiple frontends for multiple APIs), the solution will stop to work.

  1. I can deploy stage app in the sub-path instead of the subdomain, e.g. internal.host/app-pr-123 instead of app-pr-123.internal.host.

This has a serious disadvantage, since the app "thinks" is alone on the domain and sets cookies correspondingly. User which opens same app on different sub-paths will get it's cookies and storage overwritten in the uncontrollable way from various versions of the app.

The configuration will look like:

routes:
  - from: https://internal.host
    prefix: /app-pr-
    to: https://restricted.googleapis.com
    host_path_regex_rewrite_pattern: ^/(app-pr-\d{1,4})$
    host_path_regex_rewrite_substitution: \1-jrioueoru-ew.a.run.app

Documentation: https://www.pomerium.com/docs/reference/routes/headers#4-host-path-regex-rewrite-patternsubstitution

Additional context

To solve the same problem, Google offers for their Application Load Balancer a so called URL-masks.

It allows creation of the network endpoint group, which points to a CloudRun, which name is dynamically extracted from the request URL.

This NEG will send requests from /aaa to a Cloud Run aaa-$(PROJECT_HASH).run.app (aaa can be anything):

resource "google_compute_region_network_endpoint_group" "path_to_service" {
  name    = "path-to-service"
  region  = "europe-west4"
  project = local.project
  cloud_run {
    url_mask = "/<service>"
  }
}

Then, at the route matcher level we can define a matcher, which forwards certain requests to that wild card backend. In this case, requests starting with the /app-pr will be forwarded to the CloudRuns named as app-pr*:

dynamic "route_rules" {
  content {
    priority = 8
    match_rules {
      prefix_match = "/app-pr"
    }
    route_action {
      weighted_backend_services {
        backend_service = google_compute_region_backend_service.path_to_service.self_link
        weight          = 100
      }
    }
  }
}

Since we control the namespace of CloudRuns (which CloudRuns and under which names we deploy), it is pretty fine from the practical point of view, even it is looks too loose.

This helps to implement a sub-path app mounting, as explained in the second workaround.

Link to Google Documentation

Interestingly, the feature-set of Googles Application Loadbalancer in this regard is not better than the what I see in Pomerium. Likely, related to the fact Google uses Envoy to implement the their L7 managed load balancers.

desimone commented 5 months ago

@0anton -- thanks this is a very helpful feature request. Could we talk about a specific implementation synchronously. We've had (versions) of this ask a few different times and I'd like to dive in more to your specific use-case if possible. Feel free to email me at bdd {at symbol} pomerium.com

0anton commented 5 months ago

Thank's for being attentive to this FR. Bob! 🙂 Sure, let's discuss! I've sent you syncrone contacts per email.

langered commented 5 months ago

This would be really helpful!

Additionally, it would be good to have a path to domain regex functionality. E.g.:

routes:
  - from: https://internal.host
    prefix: /app-pr-(\d{1,5})
    to: https://app-pr-$1-rkr95gieug-gs.a.run.app

When I am using the host_path_regex_rewrite_pattern in combination with the to: restricted.googleapis.com configuration, I'll always get redirected with a 307 to the internal cloud run domain, which is not public and therefore does not help at all.

desimone commented 5 months ago

Pending deep dive with @0anton @langered would love to have you in the discussion if you could chat synchronously.