hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.24k stars 4.41k forks source link

service-router: PrefixRewrite cannot be empty string (and docs are unclear/the example produces confusing behavior) #11000

Open chrisjohnson opened 3 years ago

chrisjohnson commented 3 years ago

Overview of the Issue

The docs for using PrefixRewrite are not great and it seems like there's a bug that won't allow me to use empty string as a value

1: I don't actually see where in the docs PrefixRewrite is actually clearly explained. I found the doc entry for it but it doesn't explain how it's actually used. This blog post is the only example for PrefixRewrite that I can find, and it does not have the structure that is currently in use (I'm assuming the structure changed since then?)

I did reverse engineer the logic by looking at the golang tests that involve PrefixRewrite and I was able to deduce that the PrefixRewrite value is used as the direct replacement value for whatever PathPrefix matched

2: That said, the example on that blog post as well as the golang tests all seem to use an example that works like this:

        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        },

However this results in a request like this: curl localhost:21001/sp/bar -> //bar

This is confusing. I understand that technically it is valid from consul's POV but it produces strange behavior where the request ends up with an extra slash. The tests should be changed and an example should be added to the documentation page

3: Finally, I think there may actually be a bug, because when I try to produce the expected result by using the following configuration:

        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": ""
            }
        },

It won't let me use an empty string for PrefixRewrite -- when I read the config back it shows that the PrefixRewrite was simply eliminated:

cjohnson@atpplatformconsul2:PRODUCTION:~> cat config.json
{
    "Kind": "service-router",
    "Name": "api2",
    "Namespace": "testing-rx8",
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/forms"
                }
            },
            "Destination": {
                "Service": "forms-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": ""
            }
        }
    ]
}
cjohnson@atpplatformconsul2:PRODUCTION:~> consul config write config.json
Config entry written: service-router/api2
cjohnson@atpplatformconsul2:PRODUCTION:~> consul config read -namespace=testing-rx8 -kind=service-router -name=api2
{
    "Kind": "service-router",
    "Name": "api2",
    "Namespace": "testing-rx8",
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/forms"
                }
            },
            "Destination": {
                "Service": "forms-api",
                "Namespace": "testing-rx8"
            }
        }
    ],
    "CreateIndex": 646564,
    "ModifyIndex": 646564
}

Consul info for both Client and Server

Client info ``` agent: check_monitors = 0 check_ttls = 0 checks = 87 services = 58 build: prerelease = revision = cf701c09 version = 1.9.7+ent consul: acl = enabled known_servers = 10 server = false license: customer = 480e18da-6001-8ce2-7b8e-8642e1cb3225 expiration_time = 2023-03-25 00:00:00 +0000 UTC features = Automated Backups, Automated Upgrades, Enhanced Read Scalability, Network Segments, Redundancy Zone, Advanced Network Federation, Namespaces, SSO, Audit Logging id = eda50725-0f08-ccaf-f3e3-b26cae0812d4 install_id = * issue_time = 2021-03-25 04:56:39.925162515 +0000 UTC modules = Global Visibility, Routing and Scale, Governance and Policy product = consul start_time = 2021-03-24 00:00:00 +0000 UTC runtime: arch = amd64 cpu_count = 4 goroutines = 1017 max_procs = 4 os = linux version = go1.15.13 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 108 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 34097 members = 12 query_queue = 0 query_time = 862 ```
Server info ``` agent: check_monitors = 0 check_ttls = 1 checks = 1 services = 1 build: prerelease = revision = cf701c09 version = 1.9.7+ent consul: acl = enabled bootstrap = false known_datacenters = 4 leader = false leader_addr = 10.3.52.53:8300 server = true license: customer = 480e18da-6001-8ce2-7b8e-8642e1cb3225 expiration_time = 2023-03-25 00:00:00 +0000 UTC features = Automated Backups, Automated Upgrades, Enhanced Read Scalability, Network Segments, Redundancy Zone, Advanced Network Federation, Namespaces, SSO, Audit Logging id = eda50725-0f08-ccaf-f3e3-b26cae0812d4 install_id = * issue_time = 2021-03-25 04:56:39.925162515 +0000 UTC modules = Global Visibility, Routing and Scale, Governance and Policy product = consul start_time = 2021-03-24 00:00:00 +0000 UTC raft: applied_index = 646355 commit_index = 646355 fsm_pending = 0 last_contact = 41.753905ms last_log_index = 646355 last_log_term = 7804 last_snapshot_index = 629975 last_snapshot_term = 7796 latest_configuration = [{Suffrage:Nonvoter ID:53fca35b-cd1a-0800-b6ee-c60527ca7313 Address:10.3.52.41:8300} {Suffrage:Voter ID:0ac1c505-c157-220a-8d8a-4af254d3d4f0 Address:10.3.52.42:8300} {Suffrage:Voter ID:c2498167-5c9c-18e0-be21-4595f7a20ae4 Address:10.3.52.55:8300} {Suffrage:Nonvoter ID:4a1ae97c-4e4f-be48-19b7-bc92b3742e20 Address:10.3.52.56:8300} {Suffrage:Voter ID:64e0f24d-a51f-c82e-17aa-2eb20c165617 Address:10.3.52.14:8300} {Suffrage:Nonvoter ID:c3809b2c-e13a-662f-03da-7f7eccd04edb Address:10.3.52.52:8300} {Suffrage:Nonvoter ID:769747f8-e741-25e1-f5f3-4714706db717 Address:10.3.52.35:8300} {Suffrage:Voter ID:107fc5ce-dbf6-1b5f-2cfa-263608ec691b Address:10.3.52.53:8300} {Suffrage:Nonvoter ID:f5ef8ba0-a971-2aa6-1eb8-d0e0e034f5ac Address:10.3.52.54:8300} {Suffrage:Voter ID:0ee8fe98-540f-b67b-878c-1b50f91a9d09 Address:10.3.52.36:8300}] latest_configuration_index = 0 num_peers = 0 protocol_version = 3 protocol_version_max = 3 protocol_version_min = 0 snapshot_version_max = 1 snapshot_version_min = 0 state = Follower term = 7804 runtime: arch = amd64 cpu_count = 2 goroutines = 285 max_procs = 2 os = linux version = go1.15.13 serf_lan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 24 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 1098 members = 10 query_queue = 0 query_time = 1 serf_wan: coordinate_resets = 0 encrypted = true event_queue = 0 event_time = 1 failed = 0 health_score = 0 intent_queue = 0 left = 0 member_time = 3766 members = 33 query_queue = 0 query_time = 862 ```
jkirschner-hashicorp commented 3 years ago

Hi @chrisjohnson,

Thank you for the detailed, thoughtful feedback on this and many other Github issues - we really appreciate it.

We'll look into this more and consider what should be improved (e.g., docs/examples, handling of PrefixRewrite).


In the meantime, as a workaround, are you able to achieve what you want by appending a / to the end of your PathPrefix and setting PrefixRewrite to /? Example:

        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp/"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        },

This won't work if you need the exact path /sp to map to /, but it seems like it would map any path of the form /sp/* to /*.


Regarding your points above:

  1. I'm also not sure why the blog post you linked includes PrefixRewrite in Routes.Match.HTTP. A docs PR from a month after that blog entry was posted shows PrefixRewrite in the same location is it now: Routes.Destination. I may try to see if that was ever accurate. Regardless, a corrected version of that example seems like a useful addition to the service-router page's sample config entries section.

  2. My initial expectation would have also been that /sp => / would work as expected, though I can also see why that results in /sp/* => //*. Going forward, we can think about it there's a better way to handle this case (e.g., perhaps ignore the leading / for rewrite purposes, so there's no // results if PrefixRewrite = '/').

  3. I haven't dug into this, but my guess is: because PrefixRewrite has a default value of "", the code doesn't make a distinction between an explicitly set value of "" and an unset value. Thus, the service-router config is seen as valid with an "unset" PrefixRewrite, which is consistent with what you observed.

chrisjohnson commented 3 years ago

That workaround is what we are currently using. You are right that it won't match the naked form (and afaik I cannot find a workaround that will support the naked form) but it helps us at least make forward progress for now. We will need the naked form supported in the near future but for today we can make progress.

If the blog post structure was never correct, I might suggest simply retro-actively editing the blog post, since currently when you google for PrefixRewrite it's one of the top results and really the only firm example of how to use it that people will find. Adding another example to the config entry page would also be great.

I'm down to approach the rewrite either way, either we allow you to rewrite to empty string, or we ignore the preceding slash. Just let me know where we land please :)

  1. That line of thinking seems valid to me.

Tangent: We tried changing our structure to match the blog post to see if that was the cause of our problems. We manage this config entry via the consul terraform provider. When we did a terraform plan (the TFE speculative plan) it did not complain about the structure, but after merging the change and going to apply, it failed because the structure of the config entry was wrong.

Should the plan step be validating the incoming structure like how the apply step does? And if so, is this a fault of the consul provider, or consul itself? Is there a place I could log a followup issue around this behavior? It's not a show-stopper but it's a pain to have already merged the PR because the speculative plan passed only to find out the apply doesn't work.

jkirschner-hashicorp commented 3 years ago

Should the plan step be validating the incoming structure like how the apply step does? And if so, is this a fault of the consul provider, or consul itself? Is there a place I could log a followup issue around this behavior?

I'd suggest posting an issue to https://github.com/hashicorp/terraform-provider-consul. From some similar posts on that provider (https://github.com/hashicorp/terraform-provider-consul/issues/281#issuecomment-904062585, https://github.com/hashicorp/terraform-provider-consul/issues/248), it seems like the plan vs apply issue may be due to the interaction of the TF provider with the flexibility of Consul's configurations, though the maintainers of that provider can give a better answer.

jkirschner-hashicorp commented 3 years ago

@chrisjohnson: after looking into (3) a bit more, I have a solution that better satisfies your use case (but requires two routes) and an understanding of why it can't currently be done with one route.

Why two routes are currently needed This seems to be a limitation imposed by Envoy. According to their docs, "stripping a prefix from a path requires multiple Routes to handle all cases".

image

Because the prefix rewrite behavior configured in Consul is ultimately performed by Envoy, Consul has the same limitation (see Envoy xDS code): https://github.com/hashicorp/consul/blob/584faec6e3968b4d099fe6b038f50436831a1c97/agent/xds/routes.go#L301-L307

There may be a way for Consul to provide a better UX on top of this Envoy constraint - we'd have to think about it more. Regardless, this should be better documented in our description of PrefixRewrite and ideally shown in an example.

A currently available solution for your use case It's not pretty, but the above constraint also shows the solution: define two routes. The first route handles the /admin/* paths. The second handles only the "naked path" (/admin), because all other /admin/* paths will have matched to the first route.

...
    "Routes": [
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp/"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        },
        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        }
    ],
...
jkirschner-hashicorp commented 2 years ago

@chrisjohnson : I've got a proposal for how the configuration UX could be simplified here. Let me know your thoughts on whether this would be a relevant part of the solution (in addition to docs/example improvements) and if you have any suggestions for improvements.

If you think such a change would be useful, I'll run it by engineering to double-check whether they have any technical feasibility or backwards-compatibility concerns.

Proposal

If the user has a service-router configuration like below (prefix /sp -> /), then ensure we rewrite both:

Example service-router configuration this would apply to

        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        },

Underlying Implementation Thoughts

Normally, Consul does a direct translation (1:1) of Consul service-router rule matches to Envoy route matches. However, if the user wants to replace all /sp and /sp/* path prefixes with /, that currently requires 2 (unintuitive) rules to be configured.

We can potentially make this behavior more intuitive by automatically creating the multiple Envoy route matches required when the user wants the above behavior. We can assume that user wants this behavior when all these criteria are met by the service-router configuration:

If these conditions are met, the rule matches passed to Envoy will actually be:

        {
            "Match": {
                "HTTP": {
                    "PathPrefix": "/sp/"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        },
        {
            "Match": {
                "HTTP": {
                    "PathExact": "/sp"
                }
            },
            "Destination": {
                "Service": "sp-api",
                "Namespace": "testing-rx8",
                "PrefixRewrite": "/"
            }
        }

To-do

chrisjohnson commented 2 years ago

Overall it definitely makes sense to me, I understand the balance between "consul via envoy" and "envoy managed by consul" is something to be paid attention to but in this case it really feels like an envoy nuance that I think consul is right to abstract away from the user.

Worst case if it does create backward compatibility issues, a flag to opt out (or in) for the behavior could be an easy way to control that. Maybe even "rewrite-mode=direct" or "rewrite-mode=auto" to allow for other operating modes in the future.