pallets / werkzeug

The comprehensive WSGI web application library.
https://werkzeug.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
6.66k stars 1.74k forks source link

Expose new path used to compute new url in RequestRedirect #3000

Open tobias-urdin opened 1 week ago

tobias-urdin commented 1 week ago

The RequestRedirect exception is raised with new_url set to the computed URL that should be redirected to, but we don't expose the path that was used to compute the URL in the exception so depending code needs to urlparse() the URL even though werkzeug already has the path and can pass it along in the exception.

Referring to this example code:

    def resolve_endpoint(self, environ, path, method):
        """Resolve endpoint from the url map.

        :environ Environment: The environment.
        :path str: The request path.
        :method str: The request method.
        :returns str: The endpoint from the url map.
        """

        urls = url_map.bind_to_environ(environ)
        try:
            endpoint, _ = urls.match(path, method=method)
            return endpoint
        except routing.RequestRedirect as e:
            r = urlparse(e.new_url)
            return resolve_endpoint(environ, r.path, method)
            # FIXME: this could be e.new_path if RequestRedirect passed path
            # return resolve_endpoint(environ, e.new_path, method)
        except (wexc.NotFound, wexc.MethodNotAllowed):
            pass

        return None
davidism commented 1 week ago

In what situation do you need this?

tobias-urdin commented 1 week ago

In this case it's a WSGI middleware that uses werkzeg's routing module for matching routes to know when to apply a specific logic. The module does a great job to match URLs but we need to handle cases where URLs contain a trailing slash or double slashes in the middle of the URL.

Example: Lets say we want to match /some/cool/url we also need to match //some/cool/url, /some/cool/url/ and /some//cool/url – works great in the current form but we could optimize away the extra urlparse() per request if werkzeug exposed the URL path it wants to redirect to and not only giving us the full URL in the RequestRedirect exception.

davidism commented 1 week ago

You could disable merge_slashes, strict_slashes, and redirect_defaults if you don't want to cause redirects and just want them to match the endpoint. You'll presumably end up with the redirects anyway once you route in the app, which would result in this logic running multiple times (the first when you match without redirects and apply it, the the second after you redirect then match and apply again). Also note that redirects can happen for other reasons, such as the rule itself being a redirect, and redirects can go to more places than the current host or subdomain. So it's not clear this would be usable in all cases.

If you're calling match in middleware, then calling it again in the actual app to do routing, then presumably the performance of a single urlsplit call is insignificant. But if it is significant, you can also do much simpler fast string operations to get the path: path = url.partition("/")[2].partition("?")[0].

tobias-urdin commented 1 week ago

This is how the Map and Rule is setup, with this and the resolve_endpoint() above we can have it return cool as endpoint for all the URLs in my previous comment, and if it returns None we just pass the request directly to the application without intercepting it.

url_rules = [
    routing.Rule('/some/cool/url', methods=['POST'], endpoint='cool'),
]
url_map = routing.Map(
    rules=url_rules,
    strict_slashes=False,
    merge_slashes=True,
    redirect_defaults=False,
)