Open HoneyryderChuck opened 1 year ago
I think Ruby-core would rather avoid depending on more system libraries. It would be preferable to have a pure ruby implementation.
@nobu @hsbt any opinion here?
I think Ruby-core would rather avoid depending on more system libraries. It would be preferable to have a pure ruby implementation.
Agreed, and we don't have ffi gem on ruby/ruby
.
Agreed, and we don't have ffi gem on ruby/ruby.
But there is fiddle
, right? Couldn't it be used for the same purpose (minus JRuby support)? Nevertheless, I'd expect it to be a C extension regardless.
I think Ruby-core would rather avoid depending on more system libraries. It would be preferable to have a pure ruby implementation.
I understand the concern, hence why I'd make this an "optional dependency" a la openssl, i.e. it's either there and we have punycode, or we don't and we don't have it. libidn2 should have the same type of platform availability as openssl. And while there is maintenance effort in carrying this dependency forward and one should avoid it at all costs, other initiatives requiring external package dependency are already being considered, so one should balance the tradeoff between maintenance overhead vs. cost of not having the feature.
That being said, I also agree that having a pure ruby punycode implementation would be the best, but there isn't one yet.
it's either there and we have punycode, or we don't and we don't have it.
I know there is precedent for this, but this is very much a last resort thing. When you build Ruby without openssl or libyaml, it's not really "Ruby" given that the vast majority of code out there won't work. So I think we'd rather avoid creating more of these situations.
but there isn't one yet.
https://github.com/knu/ruby-domain_name/blob/c64a59027939aa34e1f5f0efc5cb654d73ccb966/lib/domain_name/punycode.rb could be used as a start.
But before all, I think we'd need a 👍 on what the new API should look like, after that work can be done, the spec isn't trivial, but it's not rocket science either and it's easy to unit test.
The link of the punycode parser you linked is IDNA 2003 compliant, not 2008. I can't evaluate what's the effort in "upgrading" it. But I agree, let's wait for more input.
parser you linked is IDNA 2003 compliant, not 2008.
I know. Hence why I said it could be used as a start.
Hey, I was trying to find an implementation of IDNA 2008 on pure Ruby for my project, and since I couldn't find anything, I wrote a new gem https://github.com/skryukov/uri-idna 🙃
I was inspired by @HoneyryderChuck's idea to put all IDNA-related functionality inside URI
. It would be cool to hear feedback from you :heart:
@skryukov massive effort! thank you for this 💪
I haven't done yet due diligence,but can you confirm that your library tests against standard conformance testing examples (like this one)[http://www.unicode.org/reports/tr46/#Conformance_Testing]?
Besides that, integration with the URI lib would be just a matter of hooking into URI::IDNA.to_ascii(host, uts46_transitional: true)
(I guess we'd want transitional mode enabled by default)?
I'll defer to @hsbt for the details of potential integration in the ruby standard library.
Hey, @HoneyryderChuck
Yup, it conforms to all tests from UTS46 (the spec file is here). I also manually added some tests for IDNA 2008 rules, so if you know of a full IDNA 2008 testing suite, I would love to give it a spin.
I don't mind changing API and/or defaults to better suite current needs, almost all rules are configurable (here is a list of options).
Also note, that the gem might differ a bit from libidn2
, for example, libidn2
's toUnicode
version doesn't validate the result:
idn2 -d xn--fullstop-rm3g.us
full。stop.us
URI::IDNA.to_unicode("xn--fullstop-rm3g.us")
#<URI::IDNA::InvalidCodepointError: Codepoint U+3002 at position 5 of "full。stop" not allowed>
@hsbt is there anything I can do to help this issue going?
Currently, the biggest "missing feature" in stdlib ruby URI/DNS resolution supply chain, is IDNA support. addressable, the OS alternative to stdlib uri, has some support for it, which is, I believe, the main reason why it is a transitive dependency from many other gems (It's other feature, uri templates, is just not as compelling).
This is a proposal for a way to solve this.
punycode
IDNA domains are translated to its punycode representation, in order to be used in DNS queries (which require ascii domains). ruby core stdlib does not have a
punycode
converter, so this is where it should start IMO. For that, I propose: either a newpunycode
stdlib gem (bundled?), or its functionality to be available as a submodule ofURI
in theuri
stdlib:implementation
addressable
, as well as other (mostly abandoned) gems, support the IDNA 2003 standard. You'll find bothlibidn
based extensions, as well as pure ruby ports. This has been since superseded by the IDNA 2008 standard (which essentially supports all the more recent unicode versions, plus some edge cases). While I think that a pure ruby implementation should be entertained at some point, I think that at this point,ruby
should do best by adopting the most standardized implementation around, and that's libidn2: it's used by most other network libraries, includingcurl
, and distributed as a package for most (all?) OSes supported by ruby.Integration of
libidn2
can be done by either a C extension, or FFI (I'm the maintainer of idnx, which already FFI's into libidn2 and winnls for windows). The advantage of the latter is that it works OOTB for java. The disadvantage may be performance (?), for which a C extension may be a better fit, but then we'd need to know whether java stdlib contains an equivalent of IDNA conversion supporting IDNA 2008.This means that
libidn2
would become a dependency when buildingruby
. It could be dealt with, however, as an optional dependency, likeopenssl
is: when available,URI::Punycode
is defined, and when it isn't,URI::Punycode
is not. most ruby installers could then opportunistically install the package as well, just like it's done already withopenssl
.(
addressable
is aware of its lack of IDNA 2008 support, and is working on it by FFI'ing into libidn2 as well).API
uri
could then transparently handle translation internally. I propose that, beyond the proposal made above, nothing else in the public API changes. InsteadURI::Generic
would support translation OOTB on building objects:This could then be used internally in the
resolv
library, before issuing the DNS query.