Closed jeraki closed 1 year ago
A few thoughts on this:
host
strings flying around that need to be stripped in various places. Perhaps inside of the constructor for lib/http/uri.rb
?Apparently the bracketed hosts are defined in RFC3896 § 3.2.2:
A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.
So only IPv6 addresses should be allowed.
Here is a regex for IPv6 addresses:
https://gist.github.com/syzdek/6086792
(
([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}| # 1:2:3:4:5:6:7:8
([0-9a-fA-F]{1,4}:){1,7}:| # 1:: 1:2:3:4:5:6:7::
([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8
([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8
([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8
([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8
([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8
[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8
:((:[0-9a-fA-F]{1,4}){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 ::
fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index)
::(ffff(:0{1,4}){0,1}:){0,1}
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses)
([0-9a-fA-F]{1,4}:){1,4}:
((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}
(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address)
)
Alternatively you could speculatively attempt to parse the captured inner address as an IPAddr
and verify it's an IPv6 address.
Thank you for the feedback and the useful references. I like the idea of moving this into HTTP::URI
rather than needing to remember to strip the brackets at the point of use.
An open question for me is whether the bracket-stripped form should be part of HTTP::URI
itself or whether it should be part of the underlying Addressable::URI
. Consider that Addressable::URI
expects the brackets for an IPv6 address:
bracketed = Addressable::URI.parse("http://[2606:2800:220:1:248:1893:25c8:1946]/foo?bar=baz")
bracketed.host # => "[2606:2800:220:1:248:1893:25c8:1946]"
bracketed.port # => nil
unbracketed = Addressable::URI.parse("http://2606:2800:220:1:248:1893:25c8:1946/foo?bar=baz")
unbracketed.host # => "2606:2800:220:1:248:1893:25c8"
unbracketed.port # => 1946
Should we maintain the bracketed host in the underlying Addressable::URI
and have the host
method on HTTP::URI
strip the brackets, or should we mutate the Addressable::URI
after construction to remove the brackets? I'm not sure if mutating the Addressable::URI
to something it didn't parse as would violate some intended invariant.
I notice that currently host-related methods are just delegated to the Addressable::URI
with def_delegators :@uri, :host, :normalized_host, :host=
, so presumably we'd define real methods for the two getters that do the stripping rather than delegating directly. Unclear if the setter should be changed in any way.
@jeraki I would suggest changing HTTP::URI
to eagerly strip the brackets from IPv6 addresses and caching it as a @host
variable which the accessor methods can be changed to use
Using IPAddr
looks cleaner than using our own regex. A nice property of it is that if you call to_s
on an IPv6 IPAddr
it strips the brackets for you:
ip_bracketed = IPAddr.new("[2606:2800:220:1:248:1893:25c8:1946]")
ip_bracketed.to_s # => "2606:2800:220:1:248:1893:25c8:1946"
ip_unbracketed = IPAddr.new("2606:2800:220:1:248:1893:25c8:1946")
ip_unbracketed.to_s # => "2606:2800:220:1:248:1893:25c8:1946"
So the logic for host
could be:
def host
begin
ip = IPAddr.new(uri.host)
return ip.to_s if ip.ipv6?
rescue IPAddr::Error
end
uri.host
end
Also,Addressable::URI#to_s
is used in HTTP::URI
's to_s
, to_str
, and inspect
methods, which don't strip brackets. I'm not sure what these are used for in practice, but given they're public methods it'd be a breaking change to do bracket-stripping in these methods. Maybe they don't need to be modified to match the bracket-stripping behavior of a new host
method because there isn't a use case for them to change (and it wouldn't be necessary to fix #730), but it's unsatisfying to have inconsistency in how the host is represented between these methods.
Come to think of it, host
is also already a public method via delegation, so changing host
to strip brackets is also a breaking change. Should the bracket-stripping version be exposed as a new method with a different name?
I think it's fine to change the behavior. The current behavior is broken per #730, so changing it is a bugfix.
As to how to approach the setter and whether it should mutate the underlying Addressable::URI
is a tricky question. I guess the setter should add brackets to the host being set on the underlying Addressable::URI
if host=
receives a valid IPv6 address? That is to say, it needs to update both @host
and the inner URI, and the inner URI needs to be serialized with brackets.
Okay, I pushed up a new commit that does the stripping via HTTP::URI
as we discussed. Let me know what you think.
RuboCop was reporting a violation because the top-level describe
block in the spec has too many lines, but that seems like a useless metric in this case, so I added it to the exclusions along with all the other specs. It's probably worth just turning that rule off for the whole spec directory.
The only thing I'm still unsure about is whether to add logic to HTTP::URI#host=
to make sure IPv6 addresses have brackets in the underlying Addressable::URI
. In either case, the brackets will be stripped when accessing calling HTTP::URI#host
or HTTP::URI#normalized_host
, I'm just a little wary about unexpected effects from changing the internal representation. Addressable::URI.parse
will only recognize an IPv6 address correctly if it's bracketed, and calling Addressable::URI#host
on a value constructed this way will show the brackets in the string representation. But if you mutate it after the fact with Addressable::URI#host=
, it's perfectly happy to accept an IPv6 without brackets, and subsequent calls to Addressable::URI#host
will show it without the brackets, but that form couldn't have been instantiated correctly by the parse
constructor.
@jeraki I mentioned that particular issue earlier.
I suggested the setters need to use the bracketed form of IPv6 addresses with Addressable::URI
so the URLs serialize correctly, while internally caching the host with the brackets stripped.
That's another thing I think would be easier with eagerly updating instance variables, rather than trying to do it lazily.
I think if you have one private helper method, called from both the constructor and the setter, which extracts the instance variables out of Addressable::URI
, you can have fairly clean logic for this, and also DRY out the logic for checking if an address is an IPv6 address.
Sorry for the delay. I pushed a commit to process host
and normalized_host
eagerly and to ensure the underlying Addressable::URI
has brackets when host=
is used. Let me know if you are okay with my implementation or if there's something you'd like changed. Thank you!
Looks good now, thanks!
Thanks for your guidance! When you get a chance, could you publish a new gem that includes this change?
Sure
Conceptually, it's weird having brackets in host
. They are not part of an IPv6 address by itself; the brackets are merely part of the URL syntax for IPv6 (needed to distinguish the colons inside the address from :port number).
However, the Addressable gem makes the distinction that #hostname
is the address by itself vs. #host
is the bracketed-if-IPv6 part of the #authority
portion of the URL.
With that context, this is consistent :+1:.
(But if you can switch to dealing with hostname
(?) it might simplify things)
@cben that sounds like a great point. Could one of you open a PR to switch to revert this and switch to #hostname
?
Can't believe there are 3 different methods to get the hostname
Fixes #730.
I took my best guess at how you'd want this change implemented and tested, but I'm happy to revise it based on feedback. Thanks!