Open tec27 opened 3 hours ago
I'll note that I do see the issues with non-ASCII characters in URLs, but I think trying to auto-decode these in this library is a mistake (as is developers relying on the browser to auto-encode them). For example, if we rely on the browser to auto-encode the non-ASCII string from wouter's tests, but add a %
at the front:
history.pushState(null, '', '/%пользователи/показать все/#101/げんきです?')
// location is now /%%D0%BF%D0%BE%D0%BB%D1%8C%D0%B7%D0%BE%D0%B2%D0%B0%D1%82%D0%B5%D0%BB%D0%B8/%D0%BF%D0%BE%D0%BA%D0%B0%D0%B7%D0%B0%D1%82%D1%8C%20%D0%B2%D1%81%D0%B5/
decodeURI(location.pathname) // throws "URIError: URI malformed", so wouter returns the string undecoded
This seems like an unfortunate quirk of the browser APIs, and I don't see a way to not have some aspect feel "broken" here. I think the approaches that maximize the capabilities of the library are either:
1) Don't decode anything, present all route parameters, etc. to library clients undecoded, or 2) Only decode parameters/search values (not keys)/etc. after parsing + routing (and this decoding may fail if the library client is not properly encoding their URL contents)
The current approach makes it impossible to ever accept %
s in parameters/values, even if you encode them properly, which feels worse.
As of wouter 3.x, the library now calls
decodeURI
on locations during route matching, meaning route parameters have had this called on them as well. I'm not quite sure why this was added, as it doesn't match how routing would work in a typical HTTP server. That is, if I hit:GET /path/escaped%2Fslashes
the intent is that this %2F is not treated like a slash, so if we were serving e.g. directories of files, this would look for a file named
escaped%2Fslashes
in the directorypath
, not a directory namedescaped
containing a file namedslashes
.This presents a problem that is not solvable for clients of this library if any of their route components can contain a certain collection of characters. You can end up with a parameter value that can no longer be decoded unambiguously.
For instance, if you allow parameters that contain a
%
and#
, you must useencodeURIComponent
on this parameter or the resulting location will not work properly, butdecodeURI
will partially decode this string and you can no longer determine what the original string was without parsing (and effectively doing the routing) yourself:Since the original string could also contain e.g.
%25
at the end, the string atafterWouter
is now ambiguous as well: we cannot tell whether the original string contained%
or%25
unless we go back and look at the location value, and decoding multiple times may actually "work" without throwing an error but give you a value that is not the same as the original.Calling
decodeURI
on the entire URL string multiple times makes issues like this impossible to fix, and wouter at this point seems to do it quite often.