tailscale / golink

A private shortlink service for tailnets
BSD 3-Clause "New" or "Revised" License
1.25k stars 79 forks source link

golink collapses double slashes in URL path #89

Closed willnorris closed 1 year ago

willnorris commented 1 year ago

Start with the go link go/ladder => http://ladder/. When appending a full URL to the extra path, the // gets collapsed down to a single /:

% curl -i http://go/ladder/http://bbc.com/
Location: /ladder/http:/bbc.com/

This isn't technically wrong, but for a project like ladder this is undesirable.

For imageproxy, I dealt with a similar issue by switching to gorilla's Mux, and disabling path cleaning. I'd rather not do that here if we can keep from it. But if I remember correctly, the default http.ServeMux always cleans paths and doesn't provide the ability to disable that. There's work happening on http.ServeMux in go1.22, so I can look and see if any of that would be helpful here.

creachadair commented 1 year ago

Ideally if one wants to put something with path reserved characters into the path, they ought to be escaped before cleaning, I think? That is, we don't really want the components of the embedded URL to behave like part of the enclosing path, but to be a single element, right?

I imagine we could do that escaping proactively at creation time without breaking anything, right?

willnorris commented 1 year ago

Maybe we expose a template func that could enforce this behavior when desired? go/ladder => http://ladder/{{ raw .Path }} or whatever. Just in case we're not convinced this is safe to do as the default behavior. I'm still not 100% sure that would solve the problem if the cleaning is happening at a later step, which I think is what you're alluding to also.

willnorris commented 1 year ago

oh, it's not happening later... it's happening earlier. We're using http.DefaultServeMux in our main func, and I believe that's doing the URL cleaning before it even gets passed off to our handler.

creachadair commented 1 year ago

Well for this example I'm assuming we'd encode the string the user entered as the label, so http://blah/blee becomes http:%2F%2Fblah%2Fblee and will subsequently be unaffected by path cleaning. It does mean we'd need to unescape during the resolve step, but I think that would be transparent to the user?

willnorris commented 1 year ago

in this case, the URL that's getting munged is not part of the defined go link... it's the extra path that is being added by the user at resolution time. I think r.URL.RawPath would likely still have the original path, so we could just expose that to templates. Then you could define a golink of go/ladder => http://ladder/{{ .RawPath }}

creachadair commented 1 year ago

oh, it's not happening later... it's happening earlier. We're using http.DefaultServeMux in our main func, and I believe that's doing the URL cleaning before it even gets passed off to our handler.

Hm. I guess that means the browser isn't encoding it, which I suppose isn't entirely crazy since it could have been a path but for the scheme label. 🤔

willnorris commented 1 year ago

I mean, it is just a path, and there's nothing for the browser to encode. http://go/ladder/http://bbc.com/ is a perfectly cromulent URL.. the byte sequence http: might be part of a scheme label if it were at the beginning of the URL, but here it's just all valid path characters. And http.ServeMux is perfectly within standard operating procedure to collapse the slashes. We just don't want it to in this case.

willnorris commented 1 year ago

The new go1.22 changes for ServeMux have certainly made the handling logic more complicated (in order to support matching on method), but it still always cleans URLs. We either have to use a different mux implementation, or none at all (which wouldn't be awful, I guess... we don't have that many endpoints).