labstack / echo

High performance, minimalist Go web framework
https://echo.labstack.com
MIT License
29.7k stars 2.22k forks source link

RFE: add ability to register routes with parameters containing unescaped slashes #2034

Open Snaipe opened 2 years ago

Snaipe commented 2 years ago

Issue Description

I've had to implement a small service implementing parts of the docker registry API; the issue is that multiple endpoints look like /v2/<repo>/manifests/<reference>, where <repo> may contain unescaped slashes. /v2/library/ubuntu/manifests/latest is thus a valid endpoint to get the manifests of the latest tag for the library/ubuntu repository.

The problem is that naively, Echo.GET("/v2/:repo/manifests/:reference") doesn't work (which I expected), and Echo.GET("/v2/*/manifests/:reference") does not work either, because reference does not get recognized. Worse, if you go with the second form, and register another endpoint like "/v2/*/blobs/:digest", the second one starts matching paths like "/v2/repo/manifests/foobar".

It'd be great if this worked seamlessly. It doesn't seem crazy to have a greedy matching syntax for parameters that may contain slashes, like /v2/:*repo/manifests/:reference -- regular expressions are able to do just that, in fact.

I've worked around this issue with a .Pre handler that rewrites the URL with path-escaped components based on a set of route-specific regular expressions:

// HACK: This tidbit of code here has the purpose of URL-escaping components
// that are allowed to contain slashes, based on the allowed set of routes.
// This is done to support URLs like /v2/library/ubuntu/manifests/latest
// without needing to URL-escape repo names client-side.
var escapes []*regexp.Regexp

escapes = append(escapes, regexp.MustCompile("^/v2/(.*)/manifests/(.*)$"))
escapes = append(escapes, regexp.MustCompile("^/v2/(.*)/blobs/(.*)$"))

e.Pre(func(handler echo.HandlerFunc) echo.HandlerFunc {
    return func(c echo.Context) error {
        reqpath := c.Request().URL.Path

        for _, re := range escapes {
            match := re.FindStringSubmatchIndex(reqpath)
            if match == nil {
                continue
            }
            indices := match[2:]

            var rewritten strings.Builder
            var prev int
            for i := 0; i < len(indices); i += 2 {
                lo, hi := indices[i], indices[i+1]
                rewritten.WriteString(reqpath[prev:lo])
                rewritten.WriteString(url.PathEscape(reqpath[lo:hi]))
                prev = hi
            }
            rewritten.WriteString(reqpath[prev:])
            c.Request().URL.Path = rewritten.String()
            break
        }

        return handler(c)
    }
})

Still, this is fairly half-baked, since the parameter values then need to be path-unescaped in the handler. It'd be nice if this was just plainly supported.

Checklist

Expected behaviour

Have a way to register routes with parameters containing unescaped slashes

Actual behaviour

Feature is missing

Steps to reproduce

(none, this is a feature request)

Working code to debug

(none, this is a feature request)

Version/commit

v4.6.1 (via go get)

aldas commented 2 years ago

I'll take look into it. Currently router path matching relies heavily on slashes when it traverses routes tree looking for matching node. maybe there is something we can do but echo.Router.Find() is most complex and performance sensitive part of whole repository so we need to be careful here.

lammel commented 2 years ago

I've had to implement a small service implementing parts of the docker registry API; the issue is that multiple endpoints look like /v2/<repo>/manifests/<reference>, where <repo> may contain unescaped slashes. /v2/library/ubuntu/manifests/latest is thus a valid endpoint to get the manifests of the latest tag for the library/ubuntu repository.

This looks like is a very specific use case that requires "file system" parts in the middle of the URI. As a slash is a very important part in an URI and the whole router is trimmed for that for higher performance. So this is problematic.

The problem is that naively, Echo.GET("/v2/:repo/manifests/:reference") doesn't work (which I expected), and Echo.GET("/v2/*/manifests/:reference") does not work either, because reference does not get recognized. Worse, if you go with the second form, and register another endpoint like "/v2/*/blobs/:digest", the second one starts matching paths like "/v2/repo/manifests/foobar".

A path like "/v2/*/blobs/:digest" should be an error for the current router actually, as it will never match the parts after the asterisk.

It'd be great if this worked seamlessly. It doesn't seem crazy to have a greedy matching syntax for parameters that may contain slashes, like /v2/:*repo/manifests/:reference -- regular expressions are able to do just that, in fact.

For the current router I do not see too much room for implementing that. This might change with the next major release v5 though.

I've worked around this issue with a .Pre handler that rewrites the URL with path-escaped components based on a set of route-specific regular expressions:

From the rest perspective this is a rather akward resource pattern to route. It seems to simply match static files in a filesystem. For your use case using a path like "/v2/blobs/:digest/*" would be a better match and working out of the box. But as your requirement is different and probably cannot be changed simply using I only see two options:

To summarize, I'm not sure this needs to be solved in the router. It would add quite some complexity there and probably reduce performance, whereas a workaround using path rewriting is quite simple to implement and trim to your use case. The ability to reverse a route might be affected too.

Guess I need to get a picture on other benefits a special operator like :* to allow matching slashes would bring.

aldas commented 2 years ago

v5 would help only that much that you can replace router implementation as it has interface now.

I have trouble imaging at the moment rules for this case (priority and backtracking and their relation to slash being one of the router node separator). "greedy" path param (:<name> or any (*) param has lower priority than their regular counterpart.

These new "operators" for paths are slippery slope. You could up with your own regexp syntax eventually.

Probably it would best to say that core/default functionality ends at some point and features beyond that should be supported in sense that you should be able to replace core components with extended/different versions of them.

This is going to be unpopular opinion but we should keep core of Echo simple (at least to certain degree). There are alternatives ala Fiber that strives to please everyone as a newcomer. Or Gin which has at least 2x the Echo userbase and larger API with methods for specific features - which all you can do in Echo but do not have dedicated methods.

I think I maybe should investigate how it would be to wrap existing (more flexile) Go http routers to v5 interface and provide some examples.

Snaipe commented 2 years ago

I agree that having a separate v5 Router implementation seems best. That being said, non-backtracing expression matching is a fairly well-understood problem in the world of regular expressions, that's what nondeterministic finite automata are for.

I got curious and I whipped up a terrible, unoptimized router using a finite automaton. Here's what I get running the benchmark suite in router_test.go:

goos: linux
goarch: amd64
pkg: github.com/labstack/echo/v4/routerfsa
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkRouterStaticRoutes-8              48820         26048 ns/op        7522 B/op        627 allocs/op
BenchmarkRouterStaticRoutesMisses-8       260211          4574 ns/op         768 B/op         64 allocs/op
BenchmarkRouterGitHubAPI-8                 19454         60121 ns/op       11331 B/op        944 allocs/op
BenchmarkRouterGitHubAPIMisses-8          275337          4264 ns/op         768 B/op         64 allocs/op
BenchmarkRouterParseAPI-8                 167941          6747 ns/op        1248 B/op        104 allocs/op
BenchmarkRouterParseAPIMisses-8           282285          4006 ns/op         768 B/op         64 allocs/op
BenchmarkRouterGooglePlusAPI-8            390236          3001 ns/op         624 B/op         52 allocs/op
BenchmarkRouterGooglePlusAPIMisses-8      273093          4237 ns/op         768 B/op         64 allocs/op
BenchmarkRouterParamsAndAnyAPI-8          166310          7903 ns/op        1152 B/op         96 allocs/op

Compared to the vanilla router:

goos: linux
goarch: amd64
pkg: github.com/labstack/echo/v4
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkRouterStaticRoutes-8              70563         16831 ns/op           0 B/op       0 allocs/op
BenchmarkRouterStaticRoutesMisses-8      2103000           535.5 ns/op         0 B/op       0 allocs/op
BenchmarkRouterGitHubAPI-8                 41346         28990 ns/op           0 B/op       0 allocs/op
BenchmarkRouterGitHubAPIMisses-8         1664457           734.7 ns/op         0 B/op       0 allocs/op
BenchmarkRouterParseAPI-8                 717120          1544 ns/op           0 B/op       0 allocs/op
BenchmarkRouterParseAPIMisses-8          3287810           378.0 ns/op         0 B/op       0 allocs/op
BenchmarkRouterGooglePlusAPI-8           1000000          1041 ns/op           0 B/op       0 allocs/op
BenchmarkRouterGooglePlusAPIMisses-8     2035212           583.3 ns/op         0 B/op          0 allocs/op
BenchmarkRouterParamsAndAnyAPI-8          505057          2624 ns/op           0 B/op          0 allocs/op

So it's about 3x-10x worse than the normal router, and of course adds a ton of allocations due to my lazy implementation, but it looks like this has a chance to have similar performances as the vanilla router if it gets properly implemented and optimized. I don't have the time to investigate this further right now, but if this seems sound, I can look at providing a real implementation using the v5 router interface if nobody beats me to it.

aldas commented 2 years ago

This looks cool. But I imagine it gets way harder when addition to Github API route tests all those other edge case tests need to pass. I do not think Github routes even touch any of backtracking requirements. Zero allocations is also quite good brainteaser.

For history sake I'll link last backtracking related threads here #1770 and #1754 as those discussions illustrate some of the hidden complexities.

offtopic: one "wild" idea that I had is that we could actually inspect routes at the start of the server and depending of "characteristics" different/specific router implementation could be chosen. For example if you have only static routes you probably do not need backtracking etc at all. But I admit - this is over-engineering. Ability to compose Echo with Router of your own choosing opens most doors and at the same time does not introduce much complexity.