m6w6 / ext-http

Extended HTTP Support
BSD 2-Clause "Simplified" License
79 stars 22 forks source link

Parsing url with unicode characters #110

Closed khavishbhundoo closed 3 years ago

khavishbhundoo commented 3 years ago

Trying to feed https://ru.wikipedia.org/wiki/Нью-Йорк_(штат) to ext-http as follows:

$url = new \http\Url($request->url);

I am getting the following error

http\Url::__construct(): Failed to parse path; unexpected byte 0xd0 at pos 6 in '/wiki/Нью-Йорк_(штат)'

Should the library be able handle this or should i do some preprocessing to support accented characters in url.I noticed that if i urlencode the whole url the resulting url from $url->toString() is malformed

khavishbhundoo commented 3 years ago

I found a solution , it should have been

$url = new \http\Url($request->url, null, \http\Url::PARSE_MBUTF8);

@m6w6 what other flags would you suggests so that i support the widest range of urls.I feed those urls to curl later for download

m6w6 commented 3 years ago

I think STDFLAGS is including pretty much already, maybe add IGNORE_ERRORS if you're brave.