Closed d-javu closed 1 month ago
These regular expressions seem overly strict and don't allow v4 in v6. They won't allow either of these: 2001:0002:1234:: / / leading zeros ::ffff:12.34.56.78 // v4 in v6. not rare in servers that handle both v4 and v6
I'd prefer a pattern more like [:.a-f0-9]+ and let the scripts that process the addresses complain if the addresses are garbled.
I'd say you can take it to the list if you feel strongly, but I'm inclined to agree with John.
My concern is that someone verbatim uses this regex, or even uses it outside the context of DMARC. If it's overly restrictive, it could cause problems.
I wrote a (too) long mail to the list, explaining a little. I accept the reluctance to make it stricter than necessary, and changes to allow leading zeroes actually make the regex shorter, and easier to read.
However, I think experience from RFC 7489, where the regex does not allow zero-bit compressed values, shows that we should not worry too much about being too restrictive. Granted, I do not have visibility into all DMARC aggregate reports ever sent, but all reports I've received over the years have the IPv6 address in the canonical format described by RFC 5952.
This change would thus cement the canonical textual representation format specified in RFC 5952's standing as the accepted format for IPv6 addresses in DMARC reports.
Can regex impose that compressed zeroes must be the longest or first sequence?
I'll be honest that I dislike that the DMARC drafts are somehow the document of record on this, and I'd be much happier if there were format/regex IETF documents for IPv4/IPv6 addresses. It feels like this should be a relaxed definition, and perhaps work to better define the format/regex should happen elsewhere (and perhaps be added as a reference as an erratum at a later point)
... the DMARC drafts are somehow the document of record on this
Strongly agree with this. DMARC is not the owner of IPv6 specification and should not take on any part of that responsibility.
Me too. Mine was a rhetorical question, since I don't think there is any way a regular expression could match the rules detailed in RFC 5952. So we might as well use the short, one-liner expression.
RFC 4291 defines the text format for an IPv6 address. RFC 5952 suggests a subset of that format. In practice, anything that can parse the 5952 subset can also parse any 4291 address, so there is no benefit at all to trying to restrict reports to the subset. As far as I can tell there is no formal definition of IPv4 addresses, but fortunately they're easy to describe in prose, four decimal integers separated by dots, and we can leave the semantics of exactly which integers are valid to the parser.
So can we agree to use my simple regex and close this?
I wanted to provide a fix to the clearly faulty IPv6 regex we currently have. In the process throwing out ipv4-mapped ipv6 addresses and other globally unroutable IPv6 addresses. Judging from the fact that the RFC 7489 IPv6 regex only matches a tiny fraction of the addresses in the reports I receive, I think no one really uses it currently.
But look, this is getting out of hand. please discuss on the mailing list where I believe the current status is that we should provide a more generic regex and add text to encourage RFC 5952 formatting, just like everyone is doing anyway.
PS: I'd answer @alevesely in the positive. I'm sure it can be done with lookahead, but that is not for the faint at heart and left as an exercise for the reader.
@d-javu Thanks. John sort of pointed at a place where some of this is noted in a proper fashion, but might be worth following up and seeing if folks would be interested in a more concrete definition of the format and matching regular expressions. I think it could really benefit the larger IETF.
PS: I'd answer @alevesely in the positive. I'm sure it can be done with lookahead, but that is not for the faint at heart and left as an exercise for the reader.
Hm... matching shorter zero sequences, which can be anywhere after the longest/first one, would explode the number of cases. A really tough exercise. However, XSD Regular Expressions don't seem to allow lookahead.
@abrotman I'll try to prepare a different pull-request based on this, and the on-list discussion.
Hm... matching shorter zero sequences, which can be anywhere after the longest/first one, would explode the number of cases. A really tough exercise.
Indeed, don't tempt me to try, to find out on way or another :)
However, XSD Regular Expressions don't seem to allow lookahead.
You're right, that is a different kind of animal. A lot of time passes between each time I do anything with XSD regexes, so I tend to forget how neutered they are.
Update the IPv6 regular expressions to match the canonical textual representation format specified in RFC 5952. Adjust the comments to clearly spell out what type of addresses we expect to see.