thephpleague / uri-manipulations

Functions and Middleware to manipulate URI Objects
https://uri.thephpleague.com/manipulations
MIT License
199 stars 0 forks source link

AppendSegment doesn't handle some ascii strings correctly #4

Closed danielbeardsley closed 6 years ago

danielbeardsley commented 6 years ago

Issue summary

AppendSegment can incorrectly encode / decode if they include an encoded % sign. We should be able to url-encode any valid ascii characters, and include them in a url such that the server (after decoding) gets the same characters out.

System information

Information Description
1.5 league uri manipulations version
7.1 PHP/HHVM version
Fedora OS Platform

Standalone code

$path = "hi%20"; # unencoded ascii chars
$uri = League\Uri\Schemes\Http::createFromString("http://blah.com/");
$result = (new League\Uri\Modifiers\AppendSegment(rawurlencode($path)))->process($uri);
var_dump($result->__toString());

Expected result

http://blah.com/hi%2520

Actual result

http://blah.com/hi%20

Note: if the above $path was hi%11 the Library would throw an Exception.

nyamsprod commented 6 years ago

Question why are you url encoding? The class does it for you. So don't use rawurlencode and you should be fine

danielbeardsley commented 6 years ago

why are you url encoding?

The docs didn't mention that the class would do it for me.

don't use rawurlencode and you should be fine

The library seems to try to guess if the content I'm giving it has already been encoded or not. If the content includes bits that resemble encoded characters, it can guess wrong and produce incorrect results.

When I drop the rawurlencode() from above, I end up with the same incorrect output: http://blah.com/hi%20. It seems to interpret the %20 as an already encoded space and then doesn't encode it further. This means there's no way to generate the valid url: http://blah.com/hi%2520


Actually the problem is somewhat deeper than that; the library actively mutates valid urls and changes their content:

$uri = League\Uri\Schemes\Http::createFromString("http://blah.com/hi%2520");
echo $result;

Produces:

http://blah.com/hi%20

Interestingly, it seems it parses it and runs each segment through the decoder up to two times:

$uri = League\Uri\Schemes\Http::createFromString("http://blah.com/%25252520/%252520/%2520/%20/ /");
echo $result;

Produces:

http://blah.com/%2520/%20/%20/%20/%20/
nyamsprod commented 6 years ago

@danielbeardsley I've found the bug it's in another component and quite frankly I'm in a process of killing this uri-manipulations library as it's just a wrapper around another component where the bug is uri-components. So in the next major release this package will be killed/removed.

The library seems to try to guess if the content I'm giving it has already been encoded or not. If the content includes bits that resemble encoded characters, it can guess wrong and produce incorrect results.

Yes it does because the goal is to avoid double encoding like in PSR-7 UriInterface::withPath method which accepts both encoded and non encoded data and returns the correct path with the correct encoding.

When I get enough time I'll tackle this bug in the uri-component package and this package will have the bug automatically fixed.

nyamsprod commented 6 years ago

a bug fix has landed on the master branch of the URI Component package. the two reported bugs are fixed. the bug will be fixed one this package once I release a bug fix release of that component package (ie: 1.8.3).

nyamsprod commented 6 years ago

bug fix is in prod with league/uri-components 1.8.2 being released