mathiasbynens / punycode.js

A robust Punycode converter that fully complies to RFC 3492 and RFC 5891.
https://mths.be/punycode
MIT License
1.6k stars 159 forks source link

Wrong conversion when Unicode is in subdirectory #16

Closed grefel closed 11 years ago

grefel commented 11 years ago

As far as i can see the URL dasviertel.de/events/viertelvergnügen is converted to http://dasviertel.xn--de/events/viertelvergngen-0wc but should be http://dasviertel.de/events/xn--viertelvergngen-0wc.

Regards, Gregor

mathiasbynens commented 11 years ago

punycode.toASCII() only accepts strings representing a domain as argument, not full URLs. This is by design as per the Punycode algorithm.

If you want to convert a full URL, parse the hostname out of it first, then run only that part through punycode.toASCII.


the URL dasviertel.de/events/viertelvergnügen is converted to http://dasviertel.xn--de/events/viertelvergngen-0wc but should be http://dasviertel.de/events/xn--viertelvergngen-0wc

That’s not quite true; it should be http://dasviertel.de/events/viertelvergnügen or its URL-encoded version http://dasviertel.de/events/viertelvergn%C3%BCgen, but nothing else. Punycode only applies to the hostname part of the URL.

grefel commented 11 years ago

Ok, if anyone looking for a quick fix (this is not extensively tested!) just edit mapDomain() (Line 90) to

function mapDomain(string, fn) {
    var splitDomain = string.match(/(.*?\.[a-z]+\/)(.*?)$/i);   
    if (splitDomain) {
        var hostname = map(splitDomain[1].split(regexSeparators), fn).join('.');
        var subdirectory = map(splitDomain[2].split(/\//), fn).join('/');       
        return hostname + subdirectory;
    } else {
        return map(string.split(regexSeparators), fn).join('.')
    }
}
mathiasbynens commented 11 years ago

This should not be “patched” in Punycode.js!

Here’s a solution to your problem using @rodneyrehm’s URI.js library:

var url = 'http://mañana.ext/events/viertelvergnügen';
var hostname = URI(url).hostname();
var punycodeHostname = punycode.toASCII(hostname);
URI(url).hostname(punycodeHostname).toString();
// → 'http://xn--maana-pta.ext/events/viertelvergnügen'
grefel commented 11 years ago

ok, I agree...

rodneyrehm commented 11 years ago

If this is a common task, we can also add a method to URI.js to encapsulate calling punycode. (nobody has requested this so far, though)

grefel commented 11 years ago

it's not that common :-) i'm fine with the solution from matthias