Path is decoded in createLocation

ArthurKnaus commented 7 years ago

This results in an invalid URL being pushed to the HTML5 history element when there are encoded percent signs in the pathname.

E.g.:

history.push({
  pathname: `randomlocation/${ encodeURIComponent("AB%C") }/furthermore`
});

results in "randomlocation/AB%C/futhermore" being pushed to the HTML5 history.

luismasg commented 6 years ago

Hi!!! glad to see there is a discussion going on about this. I'm just here because I need a way to stop the router from crashing. our QA is making a fuss about it. can't we just do tap into the window object and clean up the route before it reaches the reactrouter? a quick, dirty, unhealthy way of sidestepping the issue? we don't need any decoding. just to ignore it. www.ourhomepage.com/% is breaking the site. It might be an issue once we actually start parsing every single url. but for now, anyone have an idea how we can throw away everything after the '/' ?

luismasg commented 6 years ago

@mjackson I'm in favor of just adding a note in the react router docs. the same way their is a note about scroll restoration (which totally worked for me btw 🙏 )

"handling special characters: if ever you have an issue with special characters just do this dirty hack " - something like that. Just a matter of deciding on the 'correct' hack 🤷‍♂️

This looks promising but I'm not sure how and when to call it

export function cleanParams (params) { // cleans the params and returns a new params object return Object.keys(params).reduce((newParams, param) => { let newParamValue = params[param] newParamValue = newParamValue.replace(/%26/gi, '&') newParamValue = newParamValue.replace(/%3F/gi, '?') newParamValue = newParamValue.replace(/%23/gi, '#') newParamValue = newParamValue.replace(/%2B/gi, '+') newParamValue = newParamValue.replace(/%3B/gi, ';') newParamValue = newParamValue.replace(/%2C/gi, ',') newParamValue = newParamValue.replace(/%2F/gi, '/') newParamValue = newParamValue.replace(/%3A/gi, ':') newParamValue = newParamValue.replace(/%40/gi, '@') newParamValue = newParamValue.replace(/%3D/gi, '=') newParamValue = newParamValue.replace(/%24/gi, '$') newParams[param] = newParamValue return newParams }, {}) }

right now i have a blank page in production and this console.log static-www-ho.27595e071816e19.js:114 Uncaught URIError: Pathname "/%" could not be decoded. This is likely caused by an invalid percent-encoding. at t.createLocation (static-www-home.0.4.6.e67c127595e071816e19.js:114) at b (static-www-ho.27595e071816e19.js:114) at t.default (static-www-ho.27595e071816e19.js:114) at Object.<anonymous> (static-www-ho27595e071816e19.js:114) at n (static-www-ho.27595e071816e19.js:1) at Object.<anonymous> (static-www.ho0.4.6.e67c127595e071816e19.js:114) at n (static-www.0.4.6.e67c127595e071816e19.js:1) at static-www-ho.27595e071816e19.js:1 at static-www-ho.27595e071816e19.js:1

pmwisdom commented 6 years ago

👍 this issue needs to be looked into deeper at some point, I manually have to double encode the url parts that might have percents in them.

The place I run into a problem is when someone clicks on an anchor link it actually decodes the encoded url, which brings me back to the original offending url.

charlesverge commented 6 years ago

Commenting on this issue to keep it alive as it would a nice to have fix.

To summarize the the current solution/hacks is to selectively double encodeURIComponent or manually patch components in core.

The problems I see are

Percent symbols cause javascript errors due to decoding and then reuse as a URI requiring the need of additional try / catch
encoding is handled differently in StaticRouter used for server side rendering and BrowserRouter/Link components
There appears to be no documentation missing for this edge case
The issue is spread across more then one component, adding in some coordination required

StanislavBerezin commented 6 years ago

can somebody pin point to a solution?

luismasg commented 6 years ago

Hi stan. This seems to be an invalid url. I can't solve your question but I can tell you that our devops guys ended up deciding it was a non issue

Get Outlook for Androidhttps://aka.ms/ghei36

From: StanislavBerezin notifications@github.com Sent: Saturday, September 15, 2018 8:19:59 AM To: ReactTraining/history Cc: Luis Manuel Suarez; Comment Subject: Re: [ReactTraining/history] Path is decoded in createLocation (#505)

can somebody pin point to a solution?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/ReactTraining/history/issues/505#issuecomment-421569549, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AMWs6UIMldEkmCjsIx12x5Uvr0TElNz4ks5ubP5_gaJpZM4OjeVh.

Prophet32j commented 6 years ago

@luismasg I fail to see how this is a non issue, unless you’re referring to your specific project. This issue is a bit bigger than a single project. I can tell you that my team can’t move forward with upgrading React Router because of this issue. We MUST use special characters in our routes to uniquely identify entities.

As far as this problem, is there a timeline for a major change that this bug fix can be a part of?

cephirothdy2j commented 6 years ago

Here's how we're getting around the issue in our project:

Switch your project to use Yarn if necessary.
In package.json, add "history": "4.5.1" to your project's "dependencies".
Also add "history": "4.5.1" to your "resolutions".
You likely will have to encodeURIComponent and decodeURIComponent as you push entries to history and read params off of the URL.

Per the library's changelog, the encoding / decoding stuff was first added in v4.6.0, and v4.5.1 is the most recent version published without these changes.

By making history a direct dependency of our project and having any libraries that rely on it resolve to our selective dependency, we get around the encoding / decoding mess. Yes, we do miss out on the benefits this functionality provides - as well as any other upgrades and fixes published to history since 4.5.1 - but we were not using any of those other features anyways.

Andrekra commented 5 years ago

For me it seems the decodeURI is the culprit.

I tried to visit an url with reserved characters (? # % /) so I use encodeURIComponent to build the url. /the/path%23%25%3F%2F. But the decodeURI in createLocation, changes it to /the/path/%23%%3F%2F Which cant be decoded with decodeURIComponent or used re-encode with encodeURI in the hope to get the original string back.

// User
let pathname = encodeURIComponent('%?#') // %25%3F%23

// History
pathname = decodeURI(patname) // %%3F%23

// User Trying to get the ```%?#``` back
decodeURIComponent(pathname) // Uncaught URIError: URI malformed
// Another attempt, encode it back to get the original pathname
ecodeURI(pathname) // %25%253F%2523 Huh? Thats not my original string. Double encode the %

From random googling, people said the decodeURI/encodeURI should be used on full url's, while decodeURIComponent/encodeURIComponent on parts (component) of the url. The pathname is a component. More info here for the difference between them https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURI#Description

I've wrote a few tests. If you change decodeURI to decodeURIComponent these will pass:

  describe('with a path that can be decoded', () => {
    it('handles encoded path', () => {
      expect(
        createLocation({pathname: '/the/path%23%25%3F%2F'})
      ).toMatchObject({
        pathname: '/the/path#%?/',
        search: '',
        hash: ''
      });
    });

    it('handles not encoded path', () => {
      expect(
        createLocation('/the/path?the=query#the-hash')
      ).toMatchObject({
        pathname: '/the/path',
        search: '?the=query',
        hash: '#the-hash'
      });
    });
  });

There are 3 (well 1 for 3 histories) tests failing, but I think that would be a false negative:

        Expected value to match object:
          {"pathname": "/view/%23abc"}
        Received:
          {"hash": "", "key": "rw3pt0", "pathname": "/view/#abc", "search": "", "state": undefined}

A problem would arise, when you decode %23 as an hash, and later on would re-interpertate the hash as a valid fragment (if this would be possible). Safest way would be to store the original path somewhere.

Edit:

I also discovered a second issue, related to this. If you refresh on an URL with a safe character encoded, it's decoded (PUSH). If you use POP, the path stays as is.

const history = createBrowserHistory()
console.log('INIT:', history.location.pathname)
history.listen(location => {
  console.log(history.location.pathname)
})

// Refresh on page /the/path%23%25%3F%2F
=> INIT /the/path/%23%%3F%2F (POP, decoded, notice the double %)
// Go back and forth in in browser history
=>  /the/path/%23%25%3F%2F (PUSH, no decode)

This means I sometime have to decode twice, and sometimes once, if I use the double encode method described.

My current workaround:

const decode = str => {
  try {
    return decodeURIComponent(decodeURIComponent(str))
  } catch (e) {
    try {
      return decodeURIComponent(str)
    } catch {
      return str
    }
  }
}

const encode = str => encodeURIComponent(encodeURIComponent(str))

Dalzhim commented 5 years ago

Invoking decodeURI(location.pathname) on URLs before handling them to the browser's history is a major issue. First of all, it forces the browser to store malformed URLs (i.e. the original url https://mydomain.com/test%25 is modified to be https://mydomain.com/test%). Any browser vendor could disregard such blatant misuse of the API and completely break every user of this library when modifying their API to throw when pushing/replacing invalid URLs. They could also very well silently discard malformed URLs and never allow popping them back.

In this thread, workarounds were suggested as to decode everything except % signs. I would argue that it is a bad idea because you're going to surprise some of your indirect users who are going to have a hard time when their RR matcher doesn't work the same way once the % character is part of the URL (i.e. URLs are partially decoded except %25, which means you need to match against %25 rather than %).

Even though this represents a breaking change for people who came to rely on this implicit decoding, it is also a mandatory bug fix as applications relying directly or indirectly on this library are broken in regards to the % character and there is no reasonable workaround without fixing the root cause.

In conclusion, better figure out a path to make the breaking change at the time of your choosing rather than wait for a browser vendor to force you into it. In the mean time, the main Readme should list this as a known issue.

Prophet32j commented 5 years ago

We are ditching RR for @reach router because of this massive issue.

OliverJAsh commented 5 years ago

Trying to figure out where this issue now stands and how we can move forward.

I think you're right, @OliverJAsh. We should have left the pathname encoded and then only decoded it when we needed to make the match in React Router. This is the way previous versions of the router worked.

Now, the question is how to get there from here w/out breaking everyone's apps. This would be a breaking change, so we would need to push a new major version.

https://github.com/ReactTraining/history/issues/505#issuecomment-383985723

Plan:

Raise PR to remove decoding from history
Release a new version of history with this change
Raise PR to upgrade history in react-router and handle decoding internally
Release a new version of react-router with this change

I've started 1 in https://github.com/ReactTraining/history/pull/656.

kevinkyyro commented 5 years ago

As a matter of API design, if you want to allow users to provide decoded paths, you could accept a list instead of a string:

{
  pathname: '/search/a%2Fb%20testing/books'
}
// should be equivalent to something like
{ 
  pathname: ['search', 'a/b testing', 'books'] 
}

The relationship between each form should be approximately:

listPathname = stringPathname.replace(/^\/|\/$/g, '').split('/').map(decodeURIComponent)
stringPathname = '/' + listPathname.map(encodeURIComponent).join('/')

OliverJAsh commented 5 years ago

Update on https://github.com/ReactTraining/history/issues/505#issuecomment-453175833. We've been waiting for a few weeks now for feedback from a maintainer of this repo. No signs yet. :-(

In the meantime, I thought I'd share how we're working around this at Unsplash. We use patch-package, which lets us modify the contents of node_modules. Using this, we can remove the decoding. This has fixed all of our problems.

patch-package
--- a/node_modules/history/LocationUtils.js
+++ b/node_modules/history/LocationUtils.js
@@ -44,15 +44,9 @@ var createLocation = exports.createLocation = function createLocation(path, stat
     if (state !== undefined && location.state === undefined) location.state = state;
   }

-  try {
-    location.pathname = decodeURI(location.pathname);
-  } catch (e) {
-    if (e instanceof URIError) {
-      throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-    } else {
-      throw e;
-    }
-  }
+  // Waiting for upstream PR to be merged
+  // https://github.com/ReactTraining/history/issues/505#issuecomment-453175833
+  // https://github.com/ReactTraining/history/pull/656

   if (key) location.key = key;

--- a/node_modules/history/es/LocationUtils.js
+++ b/node_modules/history/es/LocationUtils.js
@@ -31,16 +31,6 @@ export var createLocation = function createLocation(path, state, key, currentLoc
     if (state !== undefined && location.state === undefined) location.state = state;
   }

-  try {
-    location.pathname = decodeURI(location.pathname);
-  } catch (e) {
-    if (e instanceof URIError) {
-      throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-    } else {
-      throw e;
-    }
-  }
-
   if (key) location.key = key;

   if (currentLocation) {
--- a/node_modules/history/umd/history.js
+++ b/node_modules/history/umd/history.js
@@ -157,16 +157,6 @@ return /******/ (function(modules) { // webpackBootstrap
        if (state !== undefined && location.state === undefined) location.state = state;
      }

-     try {
-       location.pathname = decodeURI(location.pathname);
-     } catch (e) {
-       if (e instanceof URIError) {
-         throw new URIError('Pathname "' + location.pathname + '" could not be decoded. ' + 'This is likely caused by an invalid percent-encoding.');
-       } else {
-         throw e;
-       }
-     }
-
      if (key) location.key = key;

      if (currentLocation) {

OliverJAsh commented 5 years ago

There's some discussion in https://github.com/ReactTraining/history/pull/656 about what the correct behaviour is (as opposed to decoding pathname). Our options are:

Just use whatever pathname is provided (i.e. leave it to the browser/user to encode).
Try to normalize pathnames so that they are encoded. If the provided pathname is fully encoded, use as is, otherwise (if the pathname is not encoded at all, or some parts are not) encode the parts that are not yet encoded.

What do people here think? Can anyone foresee 2 causing issues?

With 2, I'm worried about bugs in the normalization process. This would be off loaded to encodeurl. Is it a completely safe operation? How likely is it to go wrong?

/cc @pshrmn

bripkens commented 5 years ago

Try to normalize pathnames so that they are encoded. If the provided pathname is fully encoded, use as is, otherwise (if the pathname is not encoded at all, or some parts are not) encode the parts that are not yet encoded.

IMHO this is impossible to implement. Let's say I have a path template like /article/:name with the name parameter having the value the best 10%. Since we are all fine developers we encode the value and build the full path. This results in:

$ '/article/' + encodeURIComponent('the best 10%')
'/article/the%20best%2010%25'

A generic piece of code cannot identify that this is already properly encoded. This is the whole reason why this issue here came up. encodeURI is already attempting such an implementation. For completeness:

$ encodeURI('/article/the%20best%2010%25')
/article/the%2520best%252010%2525

Just use whatever pathname is provided (i.e. leave it to the browser/user to encode).

The value is called pathname which translates in my head to path and thus to the way it is defined in the specification. IMHO it is fair to assume / require that developers properly escape when fiddling around with dynamic paths.

P.S. We are making heavy use of matrix parameters at Instana. Due to this bug we are operating a fork of history for the last 1.5 years which just removes the double encoding. It is working perfectly fine for us.

pshrmn commented 5 years ago

@bripkens encodeurl is designed to work with already/partially encoded strings.

With this sandbox you can see that it handles fully encoded, non-encoded, and partially encoded strings.

Dalzhim commented 5 years ago

@bripkens is right when he says it is impossible to implement. In order to correctly encode an URL, you must be able to prove the reverse transformation is always possible without any loss of information. Otherwise you might be causing prejudice to the application that expects to see the same URL that was given to the system originally.

Your sandbox demonstrates this impossibility. Three different source URLs are encoded to the same resulting URL. When you decode the final URL, you cannot possibly retrieve the correct source URL all of the time because you cannot guess the intent of the user. This will inevitably lead to subtle bugs that often go unnoticed.

One final example to demonstrate how encodeurl will screw up some URLs is shown in this sandbox I derived from yours. Assuming I create an article with the following title : %20 is a space %-encoded, the urlencode library sees it as being partially encoded when really, it is not encoded at all. The resulting encoded string is the following : %20%20is%20a%20space%20%25-encoded which then decodes to is a space %-encoded, corrupting the title of my article by decoding the litteral %20 I originally used.

pshrmn commented 5 years ago

@Dalzhim I'll start out with your final example; please let me know if I am misinterpreting your point.

Let's start out by completely ignoring the encodeurl package.

If we use history.pushState to navigate to this /%20 is a space %-encoded, we end up with the following pathname:

history.pushState(null, "", "/%20 is a space %-encoded")
window.location.pathname === "/%20%20is%20a%20space%20%-encoded"

The browser does not encode percentage signs, so the initial %20 octet is used as provided. We also have an invalid URI because %-e is an invalid octet.

decodeURI(window.location.pathname) // this throws an URIError

To make the URI valid, we can pass it to an encodeURI call. This will encode %-en to %25-en as well as convert %20 to %2520%20. This will result in a pathname that can be decoded to the desired result.

history.pushState(null, "", encodeURI("/%20 is a space %-encoded"))
window.location.pathname === "/%2520%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/%20 is a space %-encoded" CORRECT

Now, getting back to encodeurl, what happens if we wrap the pathnames in encodeurl calls?

history.pushState(null, "", encodeurl("/%20 is a space %-encoded"))
window.location.pathname === "/%20%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/ is a space %-encoded" INCORRECT

history.pushState(null, "", encodeurl(encodeURI("/%20 is a space %-encoded")))
window.location.pathname === "/%2520%20is%20a%20space%20%25-encoded"
decodeURI(window.location.pathname) // "/%20 is a space %-encoded" CORRECT

In the first case, we still have the incorrect pathname, but it is no longer an invalid URI because encodeurl correctly encodes %-en to %25-en. The issue is that encodeurl interpets the initial %20 as a valid octet.

In the second case, we get the same (correct) result as the plain encodeURI result; wrapping an encodeURI call in an encodeurl call does not have any issues with double encoding.

encodeurl offers partial protection, but is not capable of determining the users intent. The conclusion that I draw from this is that the user should always encode percentage signs themselves.

Dalzhim commented 5 years ago

@pshrmn : Your previous message is 100% correct. The user is the only one who has enough information to disambiguate any malformed URL. The first case ended up with a valid, but meaningless URL whereas the second example displays proper construction of the URL by the user, which makes it possible for the library to handle in a generic way. In other words, decodeURI(encodeURI(url)) === url must always be true, otherwise the error is the user's responsibility.

OliverJAsh commented 5 years ago

Someone who knows better than me should raise a PR to encodeurl to mention these caveats…

kelly-tock commented 5 years ago

Can I ask before this issue, maybe rr 3 and below, what were the isssues with how it treated path params and what not? I don’t remember tbh having an issue until 4.

remix-run / history

Path is decoded in createLocation #505