Closed sqwishy closed 1 year ago
Hey, thanks for reporting. That the difference depends on the whitespace is a bug for sure.
Apart from that, yeah maybe we should require 4 numbers for IP address like hostnames. Or maybe including IP addresses was a mistake for scheme-less URLs. We could exclude them and make it an option that can be enabled if it's really wanted.
What do you think?
Not the issue creator, but I think that a separate feature for IP addresses would make the most sense.
like you can ping 1.1 and it will try pinging 1.0.0.1 because of a quirk in networking stacks ... but it's like an accident or something.
For a fun journey down the rabbit hole, read this article.
The gist is that if an octet gets omitted, it is set to 0
. Sometimes.
As per the article, these are all things that will ping 127.0.0.1:
So, yeah, we should stay on the more conservative end of the spectrum I guess.
Demanding 4 octets is what Python requires.
>>> import ipaddress
>>> ipaddress.IPv4Address('127.1')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/ipaddress.py", line 1301, in __init__
self._ip = self._ip_int_from_string(addr_str)
File "/usr/lib/python3.7/ipaddress.py", line 1135, in _ip_int_from_string
raise AddressValueError("Expected 4 octets in %r" % ip_str)
ipaddress.AddressValueError: Expected 4 octets in '127.1'
(By now I quoted almost the entire article, but I hope it helped.)
In summary, I'd go with exactly 4 octets and a separate feature for IP addresses.
We could exclude them and make it an option that can be enabled if it's really wanted.
IPv4 addresses shorter than four octets are surprising/unexpected for most people so they shouldn't be parsed by default.
But the original point was that 1.0 was not matched on its own but it did if it had a space before it. I don't have strong feelings about how addresses are parsed. I don't use that feature of linkify.
Changed in 0.10.0, IP addresses now require exactly 4 parts: https://github.com/robinst/linkify/blob/main/CHANGELOG.md#0100---2023-06-24
Hi I noticed a number with a decimal point is picked up as a link as long as it's preceded by whitespace.
not a link
is a link
This seems like a false positive. Particularly because of the whitespace thing. Also I don't know why "1.0" would be a link but it also matches "1.2.3" regardless of whitespace and I'm not so sure about that one either.
edit okay so my understanding is something like
1.1
isn't really a valid address. it works sometimes ... like you canping 1.1
and it will try pinging1.0.0.1
because of a quirk in networking stacks ... but it's like an accident or something.ipcalc 1.1
will tell you "bad IPv4 address"; cool. But thenurl::Url::parse("http://1.1")
parses that as a URL with 1.0.0.1 as the host. So now nothing makes sense.