mathiasbynens / punycode.js

A robust Punycode converter that fully complies to RFC 3492 and RFC 5891.
https://mths.be/punycode
MIT License
1.59k stars 158 forks source link

punycode.encode('♡.com') should return 'xn--c6h.com' but instead it returns '.com-ku3b' #117

Open jasonkhanlar opened 2 years ago

jasonkhanlar commented 2 years ago

punycode.encode('♡.com') should return 'xn--c6h.com' but instead it returns '.com-ku3b'

jasonkhanlar commented 2 years ago

Correction: The shell had POSIX

LANG=
LANGUAGE=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE=C
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

and now I switched to en_US.UTF-8, and the output is "c6h" which is close, but why doesn't it show as "xn--c6h" instead? Also I found the tr46 module which seems to be better.

alveshelio commented 2 years ago

I'm quite new to the work of "punycode" and so my comment here might not make much sense, I'm sorry for that.

I have the same behaviour when using within a Typescript project in the frontend. I've checked and my file is indeed in UTF-8. I have a contact form and when when entering an email address like so 张伟@example.com it returns @example.com-0u7sn60p

I'm not sure if it should return 0u7sn60p@example.com but I'm pretty sure that @example.com-0u7sn60p isn't the right output.

@jasonkhanlar where you able to have it work? Where you using it in the command line or in a NodeJS application?

jasonkhanlar commented 2 years ago

@alveshelio I don't remember, but in my project I switched to using xmlbuilder, despite encountering https://github.com/oozcitak/xmlbuilder2/issues/117 and concluding with my own work-around for my project use case scenario https://github.com/oozcitak/xmlbuilder2/pull/131. Also other than that minor hiccup, I found it to be a beautifully wonderful library and I appreciate the devs that made it!

alveshelio commented 2 years ago

Hey Jason,

Thank you for getting back. I guess I'll have to search for something else :) Cheers

AlttiRi commented 2 years ago
new URL("https://♡.com").href

"https://xn--c6h.com/"

milewskibogumil commented 2 years ago

@AlttiRi Great solution! Is there any tricky way to reverse this function? I mean from punycode to Unicode?

silverwind commented 1 year ago

You should use .toASCII to encode domain names:


> (await import("punycode")).toASCII("♡.com")
'xn--c6h.com'