Closed abumostafa closed 6 years ago
Cool, thanks for the report!
@abumostafa What version of PHP are you using?
Thanks for the quick reply. I'm using PHP 7.2.10 (cli) (built: Sep 14 2018 07:07:08) ( NTS )
installed using brew
Interesting, I'm not getting the quite same results. If I supply a string of 'http://de.wikipedia.org/wiki/Nattō', I'm getting the same back again.
Nevertheless, RFC 3986 is quite specific about what characters can be present in a URL. I'll work on a fix to percent-encode URL paths.
Should be all good now in release 2.2
Thanks a lot
I have tested your solution however it did not work :( the problem is here in parse_url
When you call parse URL it returns wrong value
Seems to be working as far as I can tell based on the example you initially provided:
$url = new \webignition\Url\Url('http://de.wikipedia.org/wiki/Nattō');
var_dump((string)$url);
// http://de.wikipedia.org/wiki/Natt%C5%8D
That values returned by parse_url
are not ultimately correct is not a concern as there is code that subsequently remedies the matter of otherwise-incorrect path encoding.
What gives you the impression that there is still an issue? Can you give me another example of what you expect vs what you get?
My tests are failing on macOS
but pass on linux. I tested parse_url myself and it is the problem
Here is an example on macOS
php -r "print_r(parse_url('http://de.wikipedia.org/wiki/Nattō'));"
// macOs
PHP 7.2.10 (cli) (built: Sep 14 2018 07:07:08) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
with Xdebug v2.6.1, Copyright (c) 2002-2018, by Derick Rethans
with Zend OPcache v7.2.10, Copyright (c) 1999-2018, by Zend Technologies
Array
(
[scheme] => http
[host] => de.wikipedia.org
[path] => /wiki/Natt�_
)
// linux
PHP 7.2.3 (cli) (built: Mar 22 2018 22:37:54) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies
with Zend OPcache v7.2.3, Copyright (c) 1999-2018, by Zend Technologies
Array
(
[scheme] => http
[host] => de.wikipedia.org
[path] => /wiki/Nattō
)
Maybe your terminal is not able to render the relevant characters? I'm not able to find any substantiated issues with PHP itself to suggest that parse_url
behaves incorrectly under macos.
Have you run the tests that come with this package?
Thanks a lot for your help.
I can confirm it's It's not the terminal.
Here is what i did:
2.2
I did debug parse_url
and checked the output. It's different between macOs and Linux. The results exactly as here https://github.com/webignition/url/issues/28#issuecomment-428866381
Looks like there is indeed a possible issue with parse_url()
under OS X/mac OS. I say 'possible' as whether the matter constitutes a bug is somewhat open to interpretation. I'm not going to weigh in here as to whether the matter is or isn't a bug, that's up to the PHP maintainers to determine. There's not really anything I can do about that.
From looking at the above bug report, it seems like parse_url()
was never intended to be multibyte-aware and that unreserved characters should be percent-encoded. By definition, http://de.wikipedia.org/wiki/Nattō
is not valid by means of it containing unreserved characters that are not percent-encoded.
Implementing a change to handle unreserved unicode characters in a URL string brings along assumptions regarding the state of encoding of the URL being parsed. I'm not happy with such assumptions, and I'm not happy implementing and then maintaining a means for handling technically invalid URL strings.
If you correctly encode the URL to be parsed you'll not have any problems.
I totally agree with you the problem is parse_url
is not multi-byte aware. and the problem seems to be platform and encoding specific.
When i changed the OS locale
to C
instead of en_US.UTF-8
everything seems to work pretty much fine. So i would not bother about it anymore.
Thanks for taking the time to help and to support.
Feel free to close this issue
Hello,
I have a problem using the library on macOS with utf-8 characters.
Here is an example
The output is: Actual:
http://de.wikipedia.org/wiki/Natt_
Expected:http://de.wikipedia.org/wiki/Natt%C5%8D
Here is an example of UTF-8 safe parsing