remix-run / history

Manage session history with JavaScript
MIT License
8.3k stars 960 forks source link

Path is decoded in createLocation #505

Closed ArthurKnaus closed 5 years ago

ArthurKnaus commented 7 years ago

This results in an invalid URL being pushed to the HTML5 history element when there are encoded percent signs in the pathname.

E.g.:

history.push({
  pathname: `randomlocation/${ encodeURIComponent("AB%C") }/furthermore`
});

results in "randomlocation/AB%C/futhermore" being pushed to the HTML5 history.

pshrmn commented 7 years ago

Alright, so just to be clear, the reasoning behind decoding in createLocation is that it allows us to have location object's whose pathname is decoded. That means that React Router users can use decoded strings in their route paths.

// with decoding
<Route path='/역사' />
// without decoding
<Route path='/%EC%97%AD%EC%82%AC' />

Another effect of this is that params will be automatically decoded.

<Route path='/greeting/:word' />
// user visits /greeting/안녕하세요
match === {
  ...,
  params: { word: '안녕하세요' }
  // without decoding this would be %EC%95%88%EB%85%95%ED%95%98%EC%84%B8%EC%9A%94
}

History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.

There are a few thought I have on possible ways to deal with this:

Stop all decoding

I don't think that this is desirable, but I'll at least throw it out there. We could leave the decoding to the program that is using history's locations. location.pathname would just be what the user provides. Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.

// URI = '/역사'
// window.location.pathname = '/%EC%97%AD%EC%82%AC'
{ pathname: '/%EC%97%AD%EC%82%AC', ...}

history.push('/역사')
{ pathname: '/역사', ...}

Add a raw property

Before we decode the pathname, we can store the original value as location.raw (or maybe rawPathname, name isn't important here). Then, when we create a URI, we would concatenate using location.raw instead of location.pathname.

Prevent decoding of %25

Instead of just calling location.pathname = decodeURI(location.pathname), we could modify that to skip decoding the character string %25, which is % encoded.

const decodeButSkipPercent = pathname => (
  pathname.split('%25')
    .map(decodeURI)
    .join('%25')
)

That would work, but we would also end up with location's whose pathname is not necessarily fully decoded. The exception should only apply to the encoded percent sign, but it is still an exception.


I'm not sure which of these is the best choice (or maybe there is a better alternative that I did not think of?) :woman_shrugging:

dominykas commented 7 years ago

Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.

I'm sorry, I'm not familiar with the code, but is it not possible to decodeURI only and only when creating a location from window.location? I'd assume the apps/devs would never read the location directly from there, because they access it via this lib? This is the place where history does know the context and can behave accordingly?

kellyrmilligan commented 7 years ago

what is the recommended workaround for this? in components, the route params are only partially decoded, see referenced issue. we have file names that are coming from users, and sometimes they have ? in the name.

janhoeck commented 7 years ago

I have the same problem. I try to push this to the history: history.push('/data/Amazing%20Example')

and history converts this to the url: http://localhost:3000/data/Amazing Example

History removes my empty space! But why :)

Kapaacius commented 7 years ago

I guess I have a similar issue as @janhoeck I want to encode part of my link because it contains stringified object to={ /123/${encodeURIComponent(JSON.stringify({ encode: "me" }))} } But when I now click on the link, it ends up being #/123/{"encode"%3A"me"} this on the url...it kinda works but what dies for me is when someone pastes such URL to slack or some other location. Clicking on that does another set of encoding magic and then the url does not work anymore.

How can I work around it? ( 2x encodeURIComponent is not really a nice fix...but would kinda help right now :D )

OliverJAsh commented 6 years ago

@pshrmn

History then relies on the browser to automatically encode the pathname when we push to the browser. This appears to work for all characters except for the percent sign.

Do you know why browsers don't encode the % character, but encode everything else? Is it a bug or intentional for some reason? Is it applicable in all browsers?

pshrmn commented 6 years ago

@OliverJAsh Browsers use percent-encoding, so if they encoded the %, that could break pre-encoded paths. From https://tools.ietf.org/html/rfc3986#section-2.4 (emphasis mine).

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI. Implementations must not percent-encode or decode the same string more than once, as decoding an already decoded string might lead to misinterpreting a percent data octet as the beginning of a percent-encoding, or vice versa in the case of percent-encoding an already percent-encoded string.

OliverJAsh commented 6 years ago

Ah. I've noticed browsers also do this when you enter a URL in the address bar, e.g. https://google.com/?search=sale 50% off redirects to https://google.com/?search=sale%2050%%20off instead of the correct https://google.com/?search=sale%2050%25%20off.

IIUC: as a user, when entering the URL in the address bar, we know the query params are not encoded, but the computer does not!

OliverJAsh commented 6 years ago

Perhaps one argument in favour of stopping all decoding in this module is for consistency with HTML5's native history. Currently we have this inconsistency:

history.push(`/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%%20bar"
// => `history.location.pathname` is `/foo/foo % bar`

window.history.pushState(null, '', `/foo/${encodeURIComponent('foo % bar')}`)
// => URL is "/foo/foo%20%25%20bar"
// => `window.location.pathname` is `/foo/foo%20%25%20bar`

Both the generated URL and pathname are different in these two cases.

mjackson commented 6 years ago

I think my vote is to make our push as consistent with pushState as we can. That just makes sense.

@pshrmn Isn't there some other way we can provide decoded parameters to RR users? Instead of decoding the pathnames in history, can we do it later in the router (i.e. just before we need to match with path-to-regexp)?

pshrmn commented 6 years ago

RR could decode in matchPath (or maybe higher up the chain/memoized so it isn't called as often). Off the top of my head I can't recall if there was a reason that was avoided before, but I want to say that the idea was that doing it in history would make it available to all history users, not just RR users.

If decoding is removed, there will be an inconsistency in location objects unless history expects to be given encoded pathnames. Whether or not there are real consequences to this, I am not sure (pushing the same location and whether that pushes/replaces is the one that comes to mind).

pshrmn commented 6 years ago

I haven't thought through the implications, but if you wanted to enforce that pathnames are encoded for consistency, that could be done using an anchor. That would leave encoded ones alone, but ensure anything that should be encoded is.

function encode(pathname) {
  const a = document.createElement('a');
  a.setAttribute('href', pathname);
  return a.pathname;
}
OliverJAsh commented 6 years ago

@pshrmn

This appears to work for all characters except for the percent sign.

It also appears this doesn't work for & characters. To test this:

<a href="https://google.com/foo & bar"/>go</a>

When clicked, the browser navigates to https://google.com/foo%20&%20bar (& is not URI encoded), instead of the https://google.com/foo%20%26%20bar (& is URI encoded).

I don't know whether this needs to be considered as part of this or not—just pointing it out.

pshrmn commented 6 years ago

@OliverJAsh that is the search segment of a URI, which the history package doesn't touch.

OliverJAsh commented 6 years ago

@pshrmn I've updated my example to show the same problem for pathnames.

OliverJAsh commented 6 years ago

@pshrmn In fact it would seem the browser only URI encodes spaces, unlike encodeURIComponent. Example: http://output.jsbin.com/ceqerer/1.

I presume it's using encodeURI under the hood, which unlike encodeURIComponent doesn't encode reserved characters (&, …).

(My example uses query parameters for easy testing, but the same browser mechanism is used for pathnames.)

pshrmn commented 6 years ago

I'm trying to remember exactly what I was talking about in that comment (it has been six months since I wrote it). I believe that what I meant by "this appears to work for all characters except for the percent sign" is that the percent sign is the only character that we cannot safely decode and expect the browser to properly encode.

From MDN, decodeURI (which is what history is using to decode pathnames) does the following:

Replaces each escape sequence in the encoded URI with the character that it represents, but does not decode escape sequences that could not have been introduced by encodeURI. The character “#” is not decoded from escape sequences.

Also from MDN, the following characters (which does not include the percent sign) are not encoded by encodeURI (so the encoded sequences for these characters will not be decoded by decodeURI):

A-Z a-z 0-9 ; , / ? : @ & = + $ - _ . ! ~ * ' ( ) #

history.push('/test&ing');
const decoded = decodeURI('/test&ing'); // /test&ing
history.pushState(null, '', decoded)
// window.location.pathname = '/test&ing'

decodeURI won't change the escape sequence %26 because it could not have been introduced by encodeURI

history.push('/test%26ing');
const decoded = decodeURI('/test%26ing'); // /test%26ing
// window.location.pathname = '/test%26ing'

encodeURI can encode a percent sign (encodeURI('%') === '%25'), so decodeURI will decode %25. However, the browser will not encode the percent sign (because it would not want to encode %26 into %2526), which is the source of this issue.

history.push('/test%25ing'); // %25 = encodeURI('%')
const decoded = decodeURI('/test%25ing'); // /test%ing
// window.location.pathname = '/test%ing'
OliverJAsh commented 6 years ago

@pshrmn That makes sense!

IIUC, this issue only concerns the percent symbol and not other characters that will be decoded by decodeURI because the percent symbol is particularly problematic as the browser will not re-encode it, unlike other characters.

However, it is quite surprising to find that some characters in location.pathname (and location params in React Router) are decoded whilst others are not. As you have explained, this is because decodeURI is only permitted to decode some characters. However, I would expect consistency with window.location.pathname:

const myLocation = history.createLocation({ pathname: `/search/${encodeURIComponent('foo & bar')}` });

historyInstance.push(myLocation);

myLocation.pathname
// => "/search/foo %26 bar"

window.location.pathname
// => "/search/foo%20%26%20bar"

Even if most of the decoded characters are correctly re-encoded by the browser, this is still an issue when working with location.pathname (or location params in React Router) directly.

I think this is also called out in https://github.com/ReactTraining/react-router/issues/5607.

OliverJAsh commented 6 years ago

For anyone who runs into this issue and is looking for workarounds…

On unsplash.com we are working around this by double URI encoding:

history.push({
  pathname: `randomlocation/${ encodeURIComponent(encodeURIComponent("AB%C")) }/furthermore`
});

This results in randomlocation/AB%25C/futhermore being pushed to the HTML5 history, which is as expected.

However, this has unintended side effects.

If a user navigates directly to randomlocation/AB%25C/futhermore, location.pathname will be randomlocation/AB%C/futhermore (decoded).

If we push randomlocation/AB%25C/futhermore to the user's history using the above workaround (as randomlocation/AB%2525C/furthermore), location.pathname will be randomlocation/AB%25C/futhermore (encoded).

(The same applies to location params.)

To workaround this side effect, we run a "safe decode":

const safeDecode = (pathname) => {
  try {
    return decodeURIComponent(pathname);
  } catch (_error) {
    return pathname;
  }
}

I'm sure this workaround has further implications…

I'd be very happy to help getting this issue fixed. Is there enough consensus for us to go with one of the proposed solutions?

kevinliang commented 6 years ago

@OliverJAsh Double encoding doesn't work because if the user hits the back button then the forward button, the error will throw again.

OliverJAsh commented 6 years ago

@kevinliang What exactly would throw again? I tried what you said and I didn't get any exception.

kevinliang commented 6 years ago

it'll throw something like URIError: URI malformed. The reason why it would do that is because when you transition via back and then forward, the "forward" decodeURI will try to decode a "%" instead of %25.

For example, if the URL contains a parameter like https://domain.com/some%name and you tried to decodeURIComponent on some%name, your first time you will be sending over some%25%25name. ReactHistory will decode it to some%25name. Then you hit back and forward. ReactHistory will then decode that to some%name and if you try to call decodeURIComponent on that because you wanted to use a decoded version of the parameter, an error will throw. At least that's what's currently happening on my end =/ I guess if you don't get any exceptions, then it's all good!

OliverJAsh commented 6 years ago

@kevinliang I believe we mitigate that with our safeDecode function, as I demonstrated in https://github.com/ReactTraining/history/issues/505#issuecomment-360897304.

kevinliang commented 6 years ago

ahh my bad I missed that part. What would happen if (i know this is very unlikely) a user hit back and then forward again a 2nd time?

kevinliang commented 6 years ago

@OliverJAsh also with safeDecode, if the param ended up being something like %3AHello%, how would we handle that case?

Perhaps safeDecode can "re-encode" the percent on the catch and then return decodeURIComponent on that?

AnshulM34 commented 6 years ago

Hi all Since already discussed even if I encode url, The url is decoded itself. Is there any way to encode url which shows in address bar..(I want /mypage/My%20Article) ...but it is decoded and it converts %20 to space....

AnshulM34 commented 6 years ago

Apologies I found a workaround by double encoding the uri {encodeURIComponent(encodeURIComponent(item.url))} I currently donot know side effects

AnshulM34 commented 6 years ago

It does have a side effect though it shows perfect in address bar but if anybody copies the link address by right clicking it is encoded twice...I don't know how to fix this.

OliverJAsh commented 6 years ago

I patched our local copy of history to remove all decoding of location.pathname. It fixes all my issues and I'm yet to see any side effects. Is there anything I could watch out for?

OliverJAsh commented 6 years ago

@pshrmn @mjackson How do you think we should move forward? I would be happy to help implement the solution, whatever we think is best. This is quite a common issue so I'm keen to get it solved.

oliver-stripe commented 6 years ago

Is perhaps a reasonable solution to do the re-encoding manually instead of relying on the browser? The browser can't encode % because it doesn't know if the provided value is already encoded or not, but we know our value is not because we've already decoded it. Seems like making the result of createHref here a URL with an encoded path should work:

https://github.com/ReactTraining/history/blob/master/modules/createBrowserHistory.js#L175-L179

kadishmal commented 6 years ago

What we ended up doing is:

This became our workaround.

Prophet32j commented 6 years ago

I'd like to see a permanent resolution on this that would not require the hacks listed above to safe decode values. The difference in behavior between a in-app routing and sharing a URL is a deterrent to my team upgrading to RR4.

@pshrmn your option of Add a raw property looks to be the best option so far IMO. It doesn't interfere with anyone's application and is handled completely by RR. It allows teams using RR3 and below to migrate to RR4 without another breaking change.

We heavily rely on URL encoded values in our routes because we have lots of special characters and spaces in our target resources. I don't want to base64 encode these names because the routes then become much less readable and not as "share-friendly"

jh3141 commented 6 years ago

+1 for the vote of raw properties -- but honestly, any solution to this, which is a real pain to deal with for anyone using special characters in identifiers, would be better than nothing.

OliverJAsh commented 6 years ago

I've run into this issue for the second time, this time for a different reason.

We have server side rendering, and the location.pathname created by React Router's StaticRouter is different from the location.pathname created by history and used by React Router's Router.

This creates checksum errors such as:

 (client) eightMedium" href="/search/photos/sydney opera house" data-reactid="215"><span d
 (server) eightMedium" href="/search/photos/sydney%20opera%20house" data-reactid="215"><sp

Edit: this has been fixed in React Router by https://github.com/ReactTraining/react-router/pull/5722 (not released at time of writing).

OliverJAsh commented 6 years ago

IIUC, here's a summary of the problems caused by history's decoding of the pathname:

  1. pathname (and thereby also params) is decoded using decodeURI, and browsers do not re-encode percent (%) characters, resulting in invalid URLs: see original post in this issue
  2. pathname (and thereby also params) is not fully decoded (characters such as space are decoded, but characters such as & are not) because decoding uses decodeURI, which does not decode all characters, instead of decodeURIComponent, which does decode all characters. This makes it difficult (impossible?) to access the fully decoded params. See https://github.com/ReactTraining/history/issues/505#issuecomment-359550278

I would like to fix 1 by stopping all decoding of pathname. This brings the behaviour inline with the native push API and other frameworks such as Express, which will always return the pathname provided by the user—it is never decoded.

We can then build on that to fix 2 by decoding params using decodeURIComponent instead of the current decodeURI. This brings the behaviour inline with other frameworks such as Express, which automatically decodes the values in req.params using decodeURIComponent.

I'd like to understand the implications of the suggested fix, to stop all decoding.

Quoting @pshrmn in https://github.com/ReactTraining/history/issues/505#issuecomment-318119540

the reasoning behind decoding in createLocation is that it allows us to have location object's whose pathname is decoded. That means that React Router users can use decoded strings in their route paths.

As an alternative, why can't React Router decode the pathname internally as it matches route paths?

Another effect of this is that params will be automatically decoded.

They are automatically decoded, but only partially so. See bug 2 mentioned at top of this comment. Because of this, I would consider the current behaviour as a bug.

Of course, this would lead to to inconsistencies between the locations created from window.location and locations created from values passed by the user.

@pshrmn Could you elaborate on this point to help me understand what the problem is?

pshrmn commented 6 years ago
// 1. start on root
{ pathname: '/' }
// 2. navigate to /역사
{ pathname: '/역사' }
// 3. navigate to /some-other-page
{ pathnamE: '/some-other-page' }
// 4. click browser's back button
{ pathname:  '/%EC%97%AD%EC%82%AC' }

Locations 2 and 4 are the same, but have different pathname values.

history cannot just encode the pathname because it doesn't want to double encode. This means that it would be the onus of the library using history to provide encoded pathnames to create consistent locations.

OliverJAsh commented 6 years ago

Locations 2 and 4 are the same, but have different pathname values.

Forgive me but I still don't fully follow you.

If we take the native API as an example, when navigating to /역사, window.location.pathname matches that of 4 (after navigating to some other page and then back). So I don't see the problem?

// 2. navigate to /역사
window.history.pushState(null, '', '/역사')
// => window.location.pathname === "/%EC%97%AD%EC%82%AC"
pshrmn commented 6 years ago

history only generates the pathname from window.location.pathname for the initial location and when popping.

OliverJAsh commented 6 years ago

I see.

This means that it would be the onus of the library using history to provide encoded pathnames to create consistent locations.

That sounds reasonable to me, but it sounds like you think it's unreasonable?

pshrmn commented 6 years ago

I like "bytes on the outside, unicode on the inside", so having location.pathname be encoded seems wrong to me. As far as what should happen, I don't really know. I don't think there is a "right" answer. However, pushing the encoding responsibility to React Router and other packages that use history just spreads the pain instead of taking care of it here.

I don't actually use history/React Router (I have my own history/router), so I don't really want to advocate strongly one way or the other for something that doesn't actually affect me. My implementation uses location.rawPathname, which I think works well, but I also don't have a million+ weekly downloads, so I can afford to experiment more.

OliverJAsh commented 6 years ago

I think having location.pathname be encoded seems right to me because it mirrors the behaviour of the native window.location.pathname.

However, pushing the encoding responsibility to React Router and other packages that use history just spreads the pain instead of taking care of it here.

IIUC, if location.pathname was encoded, React Router would have to decode it for matching paths of Route components. Can you think of any examples when else would it need to be decoded?

If it's encoded, at least libraries have the choice of using the encoded version or decoding it to use the decoded version. If it's already decoded, they have no way of accessing the encoded version (unless we add another property like rawPathname).

If we were to add a rawPathname to represent the encoded pathname, keeping pathname decoded, which property would then be used to derive params? I'm curious how/if this would help to solve the bug whereby params are not fully decoded (see 2 in https://github.com/ReactTraining/history/issues/505#issuecomment-375646353).

kellyrmilligan commented 6 years ago

just to add into this, here's what i've been using as a workaround if I feel like this may affect a certain route:

export function cleanParams(params) {
  // cleans the params and returns a new params object
  const newParams = {}
  Object.keys(params).forEach(param => {
    let newParamValue = params[param]
    newParamValue = newParamValue.replace(/%26/gi, '&')
    newParamValue = newParamValue.replace(/%3F/gi, '?')
    newParamValue = newParamValue.replace(/%23/gi, '#')
    newParamValue = newParamValue.replace(/%2B/gi, '+')
    newParams[param] = newParamValue
  })
  return newParams
}

and I pass it this.props.match.params

seems to be working for my current use cases.

kellyrmilligan commented 6 years ago

slightly cleaner version with more characters handled:

export function cleanParams (params) {
  // cleans the params and returns a new params object
  return Object.keys(params).reduce((newParams, param) => {
    let newParamValue = params[param]
    newParamValue = newParamValue.replace(/%26/gi, '&')
    newParamValue = newParamValue.replace(/%3F/gi, '?')
    newParamValue = newParamValue.replace(/%23/gi, '#')
    newParamValue = newParamValue.replace(/%2B/gi, '+')
    newParamValue = newParamValue.replace(/%3B/gi, ';')
    newParamValue = newParamValue.replace(/%2C/gi, ',')
    newParamValue = newParamValue.replace(/%2F/gi, '/')
    newParamValue = newParamValue.replace(/%3A/gi, ':')
    newParamValue = newParamValue.replace(/%40/gi, '@')
    newParamValue = newParamValue.replace(/%3D/gi, '=')
    newParamValue = newParamValue.replace(/%24/gi, '$')
    newParams[param] = newParamValue
    return newParams
  }, {})
}
mjackson commented 6 years ago

I think you're right, @OliverJAsh. We should have left the pathname encoded and then only decoded it when we needed to make the match in React Router. This is the way previous versions of the router worked.

Now, the question is how to get there from here w/out breaking everyone's apps. This would be a breaking change, so we would need to push a new major version.

wincent commented 6 years ago

Came here to file an issue for this but I see the discussion is already quite ripe here. It seems objectively wrong to decode when creating location objects: it produces behavior which is divergent with native window.history.pushState — and with other libraries, apparently — and it is pretty easy to encounter scenarios where this leads to breakage.

For example, in one app, I have URLs with paths like /search/term. If "term" is something innocuous like "foo", then all is well. But if it is something like "\bfoo\b" (that is, a regex pattern that says to match "foo" with word boundaries on either side), then it must be encoded to "/search/%5Cfoo%5C". Once this is push-ed via the history package, it gets decoded back to "/search/\bfoo\b", and from there it eventually hits the browser's window.history.pushState at which point it gets mangled to "/search/b/foo/b" because the browser normalizes the path separators, producing a completely wrong result. If instead window.history.pushState gets the encoded path, no mangling occurs.

The workaround? Double encode to compensate for the unwanted decoding by the history package:

history.push('/search/' + encodeURIComponent(encodeURIComponent(searchTerm)), state);

Not very elegant, but it works. Still, I'd love to see the unwanted decoding removed from the history package so I could get rid of hacks like this.

kellyrmilligan commented 6 years ago

Is it only when using history.push that this happens? Or when using the link component as well? I guess I’m wondering if there is something that can be written on what to do now while the major version solution is being worked on.

wincent commented 6 years ago

@kellyrmilligan: I'd say it can definitely affect the link component if the link component ends up calling history.push under the covers (depends on where you are getting the link component from). In the example scenario I gave above, I am using a custom <Link /> component, so I had to hack it to make things work nicely with the double-encoding workaround (I want the double-encoded URL to be passed to history.push, but I want the browser to see and render a single-encoded URL in case the user inspects it); I made it take two URLs, one for display purposes and one for actually pushing to the history. Not pretty.

ghost commented 6 years ago

Why do you think it is a good idea to change the location irreversibly in the first place? Leave the display for the browser please. I would suggest using a pathnameToDecode on it instead of changing the behavior of pathname, which is the default everyone looks for.

Prophet32j commented 6 years ago

We are approaching a year of this issue being open. Are we at a point where a pull request can be started or are we still deciding on a solution?