Closed vanderlee closed 1 month ago
The file I have on my fork has these lines:
<!-- IPv4 -->
<xs:pattern value=
"((((0{1,4}:){5})|(::(0{1,4}:){0,4}))([Ff]{4}:))?(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])"/>
It allows IPv4 mapped IPv6 addresses, as well as having the extra parenthesis.
I'm fine either way.
Your pattern does not parse all valid dual IPv6/IPv4 formats though (which, admittedly, mine doesn't even attempt).
For instance the 64:ff9b::192.0.2.128
example listed on Wikipedia or the examples listed on https://www.ibm.com/docs/en/ts4500-tape-library?topic=functionality-ipv4-ipv6-address-formats.
You're right, albeit I'm not clear whether such addresses can be used globally.
An alternative stance would be to remove all alternatives and replace them with the single, simple pattern:
<xs:pattern value="[0-9a-fA-F.:]{2,45}"/>
What level of strictness is required here? I'm noticing the normal IPv6 patterns are commented as "lax" but obviously more strict than [0-9a-fA-F.:]{2,45}
.
I think the lax pattern is fine. The exact RE for IPv6 addresses is way more complex than it's worth here.
Something like (::)?(([A-Fa-f\d]{1,4}::?){1,6})?(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])
(based largely on @alevesely's version) is quite lax but covers any dual IPv6 and IPv4 I've thrown at it. Can probably be a bit simplified. It will accept some silliness like ::1::1::1.1.1.1
as well, though.
If we wanted to be strict, we should also avoid addresses like 10.0.0.0/8
, 2001:db8::/32
and a bunch of other reserved and special purpose addresses that would make no sense in an aggregate report. Conveying this by means of regular expression, albeit being an interesting exercise, adds very little value to either possible check tools based on the formal syntax or the readability of the syntax itself.
If we don't want to be strict but just have some minimal sanity check, then @alevesely's single RE should do fine.
Hey folks, a bunch of back and forth. Is the PR going to merge the regex everyone wants? If so, I'll merge it, and get a new rev published.
Should we ask on list whether to turn to the much simpler one-liner shown above?
I avoided thinking about the IPv6 regex in my batch of changes, as the amount of syntax was just overwhelming. The IPv4 mapped IPv6 syntax should go in the IPv6 part of the grammar, if the lax pattern is not adopted to replace it all.
Okay, merged
IPv4 regular expression does not match IPv4 addresses due to the dot (.) being matched as part of the '25[0-5]' branch instead of the 0-255 number pattern.