r-lib / xml2

Bindings to libxml2
https://xml2.r-lib.org/
Other
218 stars 83 forks source link

url_parse doesn't work with URL containing non-ASCII characters #442

Open MarekProkop opened 3 months ago

MarekProkop commented 3 months ago

xml2::url_parse("https://www.spa.cz/spacz/images/procedures/Slatinná%20koupel.jpg") returns port -541335376 and all other URL components blank. The reason is the character á in the path. Without it (or if it is URL encoded), the result is correct.

It may be a feature, but I guess it's rather a bug. Such URLs are quite common in some languages and all major web clients (browsers, search engines etc.) can handle them fine.