Closed ArthurKnaus closed 5 years ago
Alright, so just to be clear, the reasoning behind decoding in createLocation
is that it allows us to have location object's whose pathname
is decoded. That means that React Router users can use decoded strings in their route paths.
// with decoding
<Route path='/역사' />
// without decoding
<Route path='/%EC%97%AD%EC%82%AC' />
Another effect of this is that params will be automatically decoded.
<Route path='/greeting/:word' />
// user visits /greeting/안녕하세요
match === {
...,
params: { word: '안녕하세요' }
// without decoding this would be %EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94
}
History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.
There are a few thought I have on possible ways to deal with this:
I don't think that this is desirable, but I'll at least throw it out there. We could leave the decoding to the program that is using history's locations. location.pathname
would just be what the user provides. Of course, this would lead to to inconsistencies between the locations created from window.location
and locations created from values passed by the user.
// URI = '/역사'
// window.location.pathname = '/%EC%97%AD%EC%82%AC'
{ pathname: '/%EC%97%AD%EC%82%AC', ...}
history.push('/역사')
{ pathname: '/역사', ...}
raw
propertyBefore we decode the pathname
, we can store the original value as location.raw
(or maybe rawPathname
, name isn't important here). Then, when we create a URI, we would concatenate using location.raw
instead of location.pathname
.
%25
Instead of just calling location.pathname = decodeURI(location.pathname)
, we could modify that to skip decoding the character string %25
, which is %
encoded.
const decodeButSkipPercent = pathname => (
pathname.split('%25')
.map(decodeURI)
.join('%25')
)
That would work, but we would also end up with location's whose pathname is not necessarily fully decoded. The exception should only apply to the encoded percent sign, but it is still an exception.
I'm not sure which of these is the best choice (or maybe there is a better alternative that I did not think of?) :woman_shrugging:
Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.
I'm sorry, I'm not familiar with the code, but is it not possible to decodeURI
only and only when creating a location from window.location
? I'd assume the apps/devs would never read the location directly from there, because they access it via this lib? This is the place where history does know the context and can behave accordingly?
what is the recommended workaround for this? in components, the route params are only partially decoded, see referenced issue. we have file names that are coming from users, and sometimes they have ?
in the name.
I have the same problem.
I try to push this to the history:
history.push('/data/Amazing%20Example')
and history converts this to the url: http://localhost:3000/data/Amazing Example
History removes my empty space! But why :)
I guess I have a similar issue as @janhoeck
I want to encode part of my link because it contains stringified object
to={ /123/${encodeURIComponent(JSON.stringify({ encode: "me" }))} }
But when I now click on the link, it ends up being
#/123/{"encode"%3A"me"}
this on the url...it kinda works but what dies for me is when someone pastes such URL to slack or some other location. Clicking on that does another set of encoding magic and then the url does not work anymore.
How can I work around it? ( 2x encodeURIComponent is not really a nice fix...but would kinda help right now :D )
@pshrmn
History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.
Do you know why browsers don't encode the %
character, but encode everything else? Is it a bug or intentional for some reason? Is it applicable in all browsers?
@OliverJAsh Browsers use percent-encoding, so if they encoded the %
, that could break pre-encoded paths. From https://tools.ietf.org/html/rfc3986#section-2.4 (emphasis mine).
Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string.
Ah. I've noticed browsers also do this when you enter a URL in the address bar, e.g. https://google.com/?search=sale 50% off
redirects to https://google.com/?search=sale%2050%%20off
instead of the correct https://google.com/?search=sale%2050%25%20off
.
IIUC: as a user, when entering the URL in the address bar, we know the query params are not encoded, but the computer does not!
Perhaps one argument in favour of stopping all decoding in this module is for consistency with HTML5's native history. Currently we have this inconsistency:
history.push(`/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%%20bar"
// => `history.location.pathname` is `/foo/foo % bar`
window.history.pushState(null, '', `/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%25%20bar"
// => `window.location.pathname` is `/foo/foo%20%25%20bar`
Both the generated URL and pathname
are different in these two cases.
I think my vote is to make our push
as consistent with pushState
as we can. That just makes sense.
@pshrmn Isn't there some other way we can provide decoded parameters to RR users? Instead of decoding the pathnames in history, can we do it later in the router (i.e. just before we need to match with path-to-regexp)?
RR could decode in matchPath
(or maybe higher up the chain/memoized so it isn't called as often). Off the top of my head I can't recall if there was a reason that was avoided before, but I want to say that the idea was that doing it in history
would make it available to all history
users, not just RR users.
If decoding is removed, there will be an inconsistency in location objects unless history
expects to be given encoded pathnames. Whether or not there are real consequences to this, I am not sure (pushing the same location and whether that pushes/replaces is the one that comes to mind).
I haven't thought through the implications, but if you wanted to enforce that pathnames are encoded for consistency, that could be done using an anchor. That would leave encoded ones alone, but ensure anything that should be encoded is.
function encode(pathname) {
const a = document.createElement('a');
a.setAttribute('href', pathname);
return a.pathname;
}
@pshrmn
This appears to work for all characters except for the percent sign.
It also appears this doesn't work for &
characters. To test this:
<a href="https://google.com/foo & bar"/>go</a>
When clicked, the browser navigates to https://google.com/foo%20&%20bar
(&
is not URI encoded), instead of the https://google.com/foo%20%26%20bar
(&
is URI encoded).
I don't know whether this needs to be considered as part of this or not—just pointing it out.
@OliverJAsh that is the search
segment of a URI, which the history
package doesn't touch.
@pshrmn I've updated my example to show the same problem for pathnames.
@pshrmn In fact it would seem the browser only URI encodes spaces, unlike encodeURIComponent
. Example: http://output.jsbin.com/ceqerer/1.
I presume it's using encodeURI
under the hood, which unlike encodeURIComponent
doesn't encode reserved characters (&
, …).
(My example uses query parameters for easy testing, but the same browser mechanism is used for pathnames.)
I'm trying to remember exactly what I was talking about in that comment (it has been six months since I wrote it). I believe that what I meant by "this appears to work for all characters except for the percent sign" is that the percent sign is the only character that we cannot safely decode and expect the browser to properly encode.
From MDN, decodeURI
(which is what history
is using to decode pathname
s) does the following:
Replaces each escape sequence in the encoded URI with the character that it represents, but does not decode escape sequences that could not have been introduced by encodeURI. The character “#” is not decoded from escape sequences.
Also from MDN, the following characters (which does not include the percent sign) are not encoded by encodeURI
(so the encoded sequences for these characters will not be decoded by decodeURI
):
A-Z a-z 0-9 ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) #
history.push('/test&ing');
const decoded = decodeURI('/test&ing'); // /test&ing
history.pushState(null, '', decoded)
// window.location.pathname = '/test&ing'
decodeURI
won't change the escape sequence %26
because it could not have been introduced by encodeURI
history.push('/test%26ing');
const decoded = decodeURI('/test%26ing'); // /test%26ing
// window.location.pathname = '/test%26ing'
encodeURI
can encode a percent sign (encodeURI('%') === '%25'
), so decodeURI
will decode %25
. However, the browser will not encode the percent sign (because it would not want to encode %26
into %2526
), which is the source of this issue.
history.push('/test%25ing'); // %25 = encodeURI('%')
const decoded = decodeURI('/test%25ing'); // /test%ing
// window.location.pathname = '/test%ing'
@pshrmn That makes sense!
IIUC, this issue only concerns the percent symbol and not other characters that will be decoded by decodeURI
because the percent symbol is particularly problematic as the browser will not re-encode it, unlike other characters.
However, it is quite surprising to find that some characters in location.pathname
(and location params in React Router) are decoded whilst others are not. As you have explained, this is because decodeURI
is only permitted to decode some characters. However, I would expect consistency with window.location.pathname
:
const myLocation = history.createLocation({ pathname: `/search/${encodeURIComponent('foo & bar')}` });
historyInstance.push(myLocation);
myLocation.pathname
// => "/search/foo %26 bar"
window.location.pathname
// => "/search/foo%20%26%20bar"
Even if most of the decoded characters are correctly re-encoded by the browser, this is still an issue when working with location.pathname
(or location params in React Router) directly.
I think this is also called out in https://github.com/ReactTraining/react-router/issues/5607.
For anyone who runs into this issue and is looking for workarounds…
On unsplash.com we are working around this by double URI encoding:
history.push({
pathname: `randomlocation/${ encodeURIComponent(encodeURIComponent("AB%C")) }/furthermore`
});
This results in randomlocation/AB%25C/futhermore
being pushed to the HTML5 history, which is as expected.
However, this has unintended side effects.
If a user navigates directly to randomlocation/AB%25C/futhermore
, location.pathname
will be randomlocation/AB%C/futhermore
(decoded).
If we push randomlocation/AB%25C/futhermore
to the user's history using the above workaround (as randomlocation/AB%2525C/furthermore
), location.pathname
will be randomlocation/AB%25C/futhermore
(encoded).
(The same applies to location params.)
To workaround this side effect, we run a "safe decode":
const safeDecode = (pathname) => {
try {
return decodeURIComponent(pathname);
} catch (_error) {
return pathname;
}
}
I'm sure this workaround has further implications…
I'd be very happy to help getting this issue fixed. Is there enough consensus for us to go with one of the proposed solutions?
@OliverJAsh Double encoding doesn't work because if the user hits the back button then the forward button, the error will throw again.
@kevinliang What exactly would throw again? I tried what you said and I didn't get any exception.
it'll throw something like URIError: URI malformed
. The reason why it would do that is because when you transition via back and then forward, the "forward" decodeURI will try to decode a "%" instead of %25.
For example, if the URL contains a parameter like https://domain.com/some%name and you tried to decodeURIComponent on some%name
, your first time you will be sending over some%25%25name
. ReactHistory will decode it to some%25name
. Then you hit back and forward. ReactHistory will then decode that to some%name
and if you try to call decodeURIComponent
on that because you wanted to use a decoded version of the parameter, an error will throw. At least that's what's currently happening on my end =/
I guess if you don't get any exceptions, then it's all good!
@kevinliang I believe we mitigate that with our safeDecode
function, as I demonstrated in https://github.com/ReactTraining/history/issues/505#issuecomment-360897304.
ahh my bad I missed that part. What would happen if (i know this is very unlikely) a user hit back and then forward again a 2nd time?
@OliverJAsh also with safeDecode
, if the param ended up being something like %3AHello%
, how would we handle that case?
Perhaps safeDecode
can "re-encode" the percent on the catch
and then return decodeURIComponent on that?
Hi all Since already discussed even if I encode url, The url is decoded itself. Is there any way to encode url which shows in address bar..(I want /mypage/My%20Article) ...but it is decoded and it converts %20 to space....
Apologies I found a workaround by double encoding the uri {encodeURIComponent(encodeURIComponent(item.url))} I currently donot know side effects
It does have a side effect though it shows perfect in address bar but if anybody copies the link address by right clicking it is encoded twice...I don't know how to fix this.
I patched our local copy of history
to remove all decoding of location.pathname
. It fixes all my issues and I'm yet to see any side effects. Is there anything I could watch out for?
@pshrmn @mjackson How do you think we should move forward? I would be happy to help implement the solution, whatever we think is best. This is quite a common issue so I'm keen to get it solved.
Is perhaps a reasonable solution to do the re-encoding manually instead of relying on the browser? The browser can't encode %
because it doesn't know if the provided value is already encoded or not, but we know our value is not because we've already decoded it. Seems like making the result of createHref
here a URL with an encoded path should work:
https://github.com/ReactTraining/history/blob/master/modules/createBrowserHistory.js#L175-L179
What we ended up doing is:
Imagine a scenario where we have links like #dashboard/:name
.
We add full real links to these a
elements after manually encoding strings like:
<a href={#dashboard/${encodeURIComponent(name)}
}>{name}
A sample name is:
'~`!1@2#% 3$45^6&7*8()-_+={}[]\;"\'<,>.?'
The link turns out to be:
This means we are not relying on manual history.push(...);
action.
When a user clicks on the link, the router detects this, since this is the native History API action.
The underlying react-router match
returns us an incorrectly, partially decoded name
like:
'~`!1%402%23% 3%2445^6%267*8()-_%2B%3D{}[]\%3B"\'<%2C>.%3F'
So, we ignore this value and instead read the real URL directly from the URL like:
// The underlying History
module has an unexpected behavior where it incorrectly handles URLs with percentage %
sign.
// At the time of writing this is still an open issue https://github.com/ReactTraining/history/issues/505.
// Therefore, we fall back to reading the properly encoded URL directly from the browser URL.
// The forward slash is excluded from the scaling plan names, so the below is safe for our use case.
const nameEncoded = location.hash.match(/[^\/]+\/([^\/]+)(\/.*)?/)[1];
const name = decodeURIComponent(planNameEncoded);
This became our workaround.
I'd like to see a permanent resolution on this that would not require the hacks listed above to safe decode values. The difference in behavior between a in-app routing and sharing a URL is a deterrent to my team upgrading to RR4.
@pshrmn your option of Add a raw property
looks to be the best option so far IMO. It doesn't interfere with anyone's application and is handled completely by RR. It allows teams using RR3 and below to migrate to RR4 without another breaking change.
We heavily rely on URL encoded values in our routes because we have lots of special characters and spaces in our target resources. I don't want to base64 encode these names because the routes then become much less readable and not as "share-friendly"
+1 for the vote of raw properties -- but honestly, any solution to this, which is a real pain to deal with for anyone using special characters in identifiers, would be better than nothing.
I've run into this issue for the second time, this time for a different reason.
We have server side rendering, and the location.pathname
created by React Router's StaticRouter
is different from the location.pathname
created by history
and used by React Router's Router
.
This creates checksum errors such as:
(client) eightMedium" href="/search/photos/sydney opera house" data-reactid="215"><span d
(server) eightMedium" href="/search/photos/sydney%20opera%20house" data-reactid="215"><sp
Edit: this has been fixed in React Router by https://github.com/ReactTraining/react-router/pull/5722 (not released at time of writing).
IIUC, here's a summary of the problems caused by history's decoding of the pathname:
pathname
(and thereby also params
) is decoded using decodeURI
, and browsers do not re-encode percent (%
) characters, resulting in invalid URLs: see original post in this issuepathname
(and thereby also params
) is not fully decoded (characters such as space are decoded, but characters such as &
are not) because decoding uses decodeURI
, which does not decode all characters, instead of decodeURIComponent
, which does decode all characters. This makes it difficult (impossible?) to access the fully decoded params. See https://github.com/ReactTraining/history/issues/505#issuecomment-359550278I would like to fix 1 by stopping all decoding of pathname
. This brings the behaviour inline with the native push API and other frameworks such as Express, which will always return the pathname
provided by the user—it is never decoded.
We can then build on that to fix 2 by decoding params
using decodeURIComponent
instead of the current decodeURI
. This brings the behaviour inline with other frameworks such as Express, which automatically decodes the values in req.params
using decodeURIComponent
.
I'd like to understand the implications of the suggested fix, to stop all decoding.
Quoting @pshrmn in https://github.com/ReactTraining/history/issues/505#issuecomment-318119540
the reasoning behind decoding in createLocation is that it allows us to have location object's whose pathname is decoded. That means that React Router users can use decoded strings in their route paths.
As an alternative, why can't React Router decode the pathname internally as it matches route paths?
Another effect of this is that params will be automatically decoded.
They are automatically decoded, but only partially so. See bug 2 mentioned at top of this comment. Because of this, I would consider the current behaviour as a bug.
Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.
@pshrmn Could you elaborate on this point to help me understand what the problem is?
// 1. start on root
{ pathname: '/' }
// 2. navigate to /역사
{ pathname: '/역사' }
// 3. navigate to /some-other-page
{ pathnamE: '/some-other-page' }
// 4. click browser's back button
{ pathname: '/%EC%97%AD%EC%82%AC' }
Locations 2 and 4 are the same, but have different pathname
values.
history
cannot just encode the pathname
because it doesn't want to double encode. This means that it would be the onus of the library using history
to provide encoded pathnames to create consistent locations.
Locations 2 and 4 are the same, but have different pathname values.
Forgive me but I still don't fully follow you.
If we take the native API as an example, when navigating to /역사
, window.location.pathname
matches that of 4 (after navigating to some other page and then back). So I don't see the problem?
// 2. navigate to /역사
window.history.pushState(null, '', '/역사')
// => window.location.pathname === "/%EC%97%AD%EC%82%AC"
history
only generates the pathname
from window.location.pathname
for the initial location and when popping.
I see.
This means that it would be the onus of the library using history to provide encoded pathnames to create consistent locations.
That sounds reasonable to me, but it sounds like you think it's unreasonable?
I like "bytes on the outside, unicode on the inside", so having location.pathname
be encoded seems wrong to me. As far as what should happen, I don't really know. I don't think there is a "right" answer. However, pushing the encoding responsibility to React Router and other packages that use history
just spreads the pain instead of taking care of it here.
I don't actually use history
/React Router (I have my own history/router), so I don't really want to advocate strongly one way or the other for something that doesn't actually affect me. My implementation uses location.rawPathname
, which I think works well, but I also don't have a million+ weekly downloads, so I can afford to experiment more.
I think having location.pathname
be encoded seems right to me because it mirrors the behaviour of the native window.location.pathname
.
However, pushing the encoding responsibility to React Router and other packages that use history just spreads the pain instead of taking care of it here.
IIUC, if location.pathname
was encoded, React Router would have to decode it for matching paths of Route
components. Can you think of any examples when else would it need to be decoded?
If it's encoded, at least libraries have the choice of using the encoded version or decoding it to use the decoded version. If it's already decoded, they have no way of accessing the encoded version (unless we add another property like rawPathname
).
If we were to add a rawPathname
to represent the encoded pathname, keeping pathname
decoded, which property would then be used to derive params
? I'm curious how/if this would help to solve the bug whereby params
are not fully decoded (see 2 in https://github.com/ReactTraining/history/issues/505#issuecomment-375646353).
just to add into this, here's what i've been using as a workaround if I feel like this may affect a certain route:
export function cleanParams(params) {
// cleans the params and returns a new params object
const newParams = {}
Object.keys(params).forEach(param => {
let newParamValue = params[param]
newParamValue = newParamValue.replace(/%26/gi, '&')
newParamValue = newParamValue.replace(/%3F/gi, '?')
newParamValue = newParamValue.replace(/%23/gi, '#')
newParamValue = newParamValue.replace(/%2B/gi, '+')
newParams[param] = newParamValue
})
return newParams
}
and I pass it this.props.match.params
seems to be working for my current use cases.
slightly cleaner version with more characters handled:
export function cleanParams (params) {
// cleans the params and returns a new params object
return Object.keys(params).reduce((newParams, param) => {
let newParamValue = params[param]
newParamValue = newParamValue.replace(/%26/gi, '&')
newParamValue = newParamValue.replace(/%3F/gi, '?')
newParamValue = newParamValue.replace(/%23/gi, '#')
newParamValue = newParamValue.replace(/%2B/gi, '+')
newParamValue = newParamValue.replace(/%3B/gi, ';')
newParamValue = newParamValue.replace(/%2C/gi, ',')
newParamValue = newParamValue.replace(/%2F/gi, '/')
newParamValue = newParamValue.replace(/%3A/gi, ':')
newParamValue = newParamValue.replace(/%40/gi, '@')
newParamValue = newParamValue.replace(/%3D/gi, '=')
newParamValue = newParamValue.replace(/%24/gi, '$')
newParams[param] = newParamValue
return newParams
}, {})
}
I think you're right, @OliverJAsh. We should have left the pathname encoded and then only decoded it when we needed to make the match in React Router. This is the way previous versions of the router worked.
Now, the question is how to get there from here w/out breaking everyone's apps. This would be a breaking change, so we would need to push a new major version.
Came here to file an issue for this but I see the discussion is already quite ripe here. It seems objectively wrong to decode when creating location objects: it produces behavior which is divergent with native window.history.pushState
— and with other libraries, apparently — and it is pretty easy to encounter scenarios where this leads to breakage.
For example, in one app, I have URLs with paths like /search/term
. If "term" is something innocuous like "foo", then all is well. But if it is something like "\bfoo\b" (that is, a regex pattern that says to match "foo" with word boundaries on either side), then it must be encoded to "/search/%5Cfoo%5C". Once this is push
-ed via the history package, it gets decoded back to "/search/\bfoo\b", and from there it eventually hits the browser's window.history.pushState
at which point it gets mangled to "/search/b/foo/b" because the browser normalizes the path separators, producing a completely wrong result. If instead window.history.pushState
gets the encoded path, no mangling occurs.
The workaround? Double encode to compensate for the unwanted decoding by the history package:
history.push('/search/' + encodeURIComponent(encodeURIComponent(searchTerm)), state);
Not very elegant, but it works. Still, I'd love to see the unwanted decoding removed from the history package so I could get rid of hacks like this.
Is it only when using history.push that this happens? Or when using the link component as well? I guess I’m wondering if there is something that can be written on what to do now while the major version solution is being worked on.
@kellyrmilligan: I'd say it can definitely affect the link component if the link component ends up calling history.push
under the covers (depends on where you are getting the link component from). In the example scenario I gave above, I am using a custom <Link />
component, so I had to hack it to make things work nicely with the double-encoding workaround (I want the double-encoded URL to be passed to history.push
, but I want the browser to see and render a single-encoded URL in case the user inspects it); I made it take two URLs, one for display purposes and one for actually pushing to the history. Not pretty.
Why do you think it is a good idea to change the location irreversibly in the first place? Leave the display for the browser please. I would suggest using a pathnameToDecode
on it instead of changing the behavior of pathname
, which is the default everyone looks for.
We are approaching a year of this issue being open. Are we at a point where a pull request can be started or are we still deciding on a solution?
This results in an invalid URL being pushed to the HTML5 history element when there are encoded percent signs in the pathname.
E.g.:
results in
"randomlocation/AB%C/futhermore"
being pushed to the HTML5 history.