Open LiberalArtist opened 5 years ago
I think the right thing to do is for you not use the result of dispatch-rules
as your start
, but instead use another function that does this URL rewriting on the request object and then calls the dispatch-rules
function. Is there a reason you can't do that?
That is what I am doing now. I think for some reason I had originally thought that function on the right-hand side of a dispatch-rules
clause was required to return a response?
, but I see now that I can just return #f
, so I don't need the additional complexity of dispatch-rules+applies
.
As I've started diving into the weeds, I think I agree that dispatch-rules
should not bake in this URL rewriting functionality, because the specifics of what rewriting you want to do (let alone how you want to respond to acceptable but non-canonical URLs) seem likely to be application-specific. There might be a few sensible choices common enough to be reusable, but working out what they are and a good API would take some experimentation before I'd want them in the standard library.
I'm still working on this for my own application: writing this issue revealed ways my URL handling still wasn't working as expected even after my first attempt at a work-around. My current plan is to:
simplify-url
, analogous to simplify-path
. It would combine the unexported function remove-dot-segments
from net/url-string
(which implements RFC 3986 §5.2.4) with a pass to eliminate redundant path separators. (I think, by analogy with simplify-path
, that should be "except for a single trailing separator," which an application might or might not want to throw away.)dispatch-rules
family with respect to these subtleties and the nature of request URLs supplied by the web server library, so that others don't get surprised.
I think, on further consideration, I would rather note in the documentation that ()
is not ("")
, rather than making it a syntax error or warning, since it could match a manually-created request?
object.Related to my proposed simplify-url
, I noticed the following match
cases in web-server/dispatchers/filesystem-map
, and I'm not sure that I understand why it treats "."
differently than ".."
:
https://github.com/racket/web-server/blob/894613d7c52a7b25ec1d05ac9414bb5bd94dd17a/web-server-lib/web-server/dispatchers/filesystem-map.rkt#L25-L28
As illustrated in the program below, ""
, "."
, 'same
, ".."
, and 'up
are all distinct URL path elements, though apparently url->path
treats ""
as 'same
(should it?). On the other hand, url->path
raises an exception for URLs containing "."
or ".."
elements. (In that case, I think it is probably correct, but it shouldn't report the error in terms of bytes->path-element
.)
I'm not sure if this code should treat "."
and ".."
as equivalent to 'same
and 'up
or not, but I would think it should do the same in both cases.
I guess I would have expected url->path
to be consistent with url->string
, other than the inherent differences (e.g. that filesystem paths can't percent-encode arbitrary path elements). If there is a good reason for them to behave differently, maybe my proposed simplify-url
should have a mode to select which of the two behaviors to use.
(Apparently I neglected to paste the example above ...)
#lang racket
(require net/url-string
rackunit)
(define (path-element->url elem)
(url "https" #f "example.com" #f #t
(list (path/param "a" null)
(path/param elem null)
(path/param "b" null))
null #f))
(for ([pr '(["" . "https://example.com/a//b"]
["." . "https://example.com/a/%2e/b"]
[same . "https://example.com/a/./b"]
[".." . "https://example.com/a/%2e%2e/b"]
[up . "https://example.com/a/../b"])])
(match-define (cons elem rendered) pr)
(check-equal? (url->string (path-element->url elem))
rendered))
(check-equal? (url->path (path-element->url ""))
(url->path (path-element->url 'same)))
(for ([special '("." "..")])
(check-exn #rx"bytes->path-element.*names a special element"
(λ () (url->path (path-element->url special)))))
I fixed this by adding an extra string-arg
element to the dispatch rule and then treating it as a dummy argument in the handler function. Pretty hacky and you may need multiple dispatch rules (my case is always going to have a trailing slash) but it may be simpler to implement than other suggestions.
A note in the documentation on dispatch-rules
might be a good idea.
Just a quick note to say, that in many situations it is simpler to redirect to the canonical url.
With these dispatch rules
[("blog" "" (string-arg) ...) do-blog] ; /blog/
[("blog") redirect-to-blog] ; /blog -> /blog/
(define (redirect-to-blog req)
(redirect-to "/blog/"))
Any requests to /blog
will be redirected to /blog/
.
If the blog contains references to stylesheets using relative urls then the canonical url works best.
I recently discovered that
dispatch-rules
and related forms make it hard to handle trailing/
s on URLs sensibly.The URL
https://example.com/foo/
(with a trailing/
) is treated differently thanhttps://example.com/foo
(without a trailing/
). Strictly speaking, this is correct: those are two distinct URLs, and in principle one could serve completely unrelated content at each. In practice, though, users don't expect such URLs to be distinct. Serving different non-error content at one than the other is a terrible idea, and even returning a successful response from one variant but, e.g., a 404 error from the other (as my sight was doing) is likely to cause confusion. I expect most programs will want to either redirect the less-preferred variant to the canonical variant or simply treat both variants as equivalent.Unfortunately,
web-server/dispatch
doesn't provide a great way to implement such behavior. Here's an example in code of the current state of affairs:This illustrates a few things:
/
and has a single, empty path element (""
).()
is allowed as a pattern, but (due to the above) will never match anything. I think it might be better to make this a syntax error (or a warning that will become an error in a future release)./
adds an empty path element (""
) to the end of the URL.cleanse-path
-like case for multiple adjacent/
separators.I have changed my application to handle trailing
/
separators by usingdispatch-rules+applies
instead ofdispatch-rules
(which involved removing myelse
clause) and, if the original request does not satisfy the predicate fromdispatch-rules+applies
but a version with a normalized path would, responding with a redirect to the normalized path.I think it would be better to extend
web-server/dispatch
to support this sort of thing, but I'm not sure yet what the best way would be to do so. A few things I've been thinking about so far:dispatch-rules
patterns doesn't have a notion of "splicing" patterns: each string literal orbidi-match-expander
use applies to a single path element.301 Moved Permanently
has issues for methods other thanGET
andHEAD
and308 Permanent Redirect
doesn't have a straightforward fallback for older browsers. I would want to do a301 Moved Permanently
forGET
andHEAD
requests and307 Temporary Redirect
for other methods, which RFC 7231 suggests is reasonable. The higher-level question is the trade-off between a simple API that "does the right thing" and configurability./
support where a similar ability to handle variants on canonical URLs would be nice: for example, case-insensitivity. I can imagine possible extensions to thedispatch-rules
API to add a general concept of non-canonical URLs, and it is appealing from one perspective to avoid making trailing/
support a baked-in special feature. On the other hand, though, there seems a risk of getting beyond the scope of this library.