python-hyper / hyperlink

🔗 Immutable, Pythonic, correct URLs.
https://hyperlink.readthedocs.io/
Other
286 stars 41 forks source link

URLs don't support fromText -> toURI with URLs containing IPv6 literals #68

Open hawkowl opened 5 years ago

hawkowl commented 5 years ago
>>> URL.fromText(u"http://[3fff::1]/foo").asURI().asText()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/hyperlink/_url.py", line 1338, in to_uri
    new_host = self.host if not self.host else idna_encode(self.host, uts46=True).decode("ascii")
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/idna/core.py", line 340, in encode
    s = uts46_remap(s, std3_rules, transitional)
  File "/home/hawkowl/venvs/commands/lib/python2.7/site-packages/idna/core.py", line 332, in uts46_remap
    _unot(code_point), pos + 1, repr(domain)))
idna.core.InvalidCodepoint: Codepoint U+003A not allowed at position 5 in u'3fff::1'
mahmoud commented 5 years ago

Hey Hawkie! This was pretty concerning at first, since I thought we had a bunch of ipv6 coverage, but now I see, so the problem is actually the to_uri() part and the newly-integrated idna stuff:

>>> url = URL.from_text(u'https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:80/')
>>> url.to_uri()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/hyperlink/_url.py", line 1338, in to_uri
    new_host = self.host if not self.host else idna_encode(self.host, uts46=True).decode("ascii")
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 358, in encode
    s = alabel(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 270, in alabel
    ulabel(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 304, in ulabel
    check_label(label)
  File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 261, in check_label
    raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label)))
idna.core.InvalidCodepoint: Codepoint U+003A at position 5 of u'2001:0db8:85a3:0000:0000:8a2e:0370:7334' not allowed

So I'm guessing we just need to skip idna-encoding of IP-literal stuff, since it's pretty much guaranteed to be ASCII (some examples). How's that sound?

hawkowl commented 5 years ago

That's the approach that Twisted's internals use -- check if it's an IP address, idna encode only if it's not.

On Thu., 6 Dec. 2018, 05:46 Mahmoud Hashemi <notifications@github.com wrote:

Hey Hawkie! This was pretty concerning at first, since I thought we had a bunch of ipv6 coverage, but now I see, so the problem is actually the to_uri() part and the newly-integrated idna stuff:

url = URL.from_text(u'https://[2001:0db8:85a3:0000:0000:8a2e:0370:7334]:80/') url.to_uri() Traceback (most recent call last): File "", line 1, in File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/hyperlink/_url.py", line 1338, in to_uri new_host = self.host if not self.host else idna_encode(self.host, uts46=True).decode("ascii") File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 358, in encode s = alabel(label) File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 270, in alabel ulabel(label) File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 304, in ulabel check_label(label) File "/home/mahmoud/virtualenvs/tmp-d364b3d6b21cd4e4/local/lib/python2.7/site-packages/idna/core.py", line 261, in check_label raise InvalidCodepoint('Codepoint {0} at position {1} of {2} not allowed'.format(_unot(cp_value), pos+1, repr(label))) idna.core.InvalidCodepoint: Codepoint U+003A at position 5 of u'2001:0db8:85a3:0000:0000:8a2e:0370:7334' not allowed

So I'm guessing we just need to skip idna-encoding of IP-literal stuff, since it's pretty much guaranteed to be ASCII (some examples http://www.gestioip.net/docu/ipv6_address_examples.html). How's that sound?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/python-hyper/hyperlink/issues/68#issuecomment-444597459, or mute the thread https://github.com/notifications/unsubscribe-auth/ADJ2XGOL8IwRqDSGVgq15IQZVz2ZpnrOks5u2BSUgaJpZM4ZCzCo .