Ipv6 regex update - Githubissues

d-javu commented 1 month ago

Update the IPv6 regular expressions to match the canonical textual representation format specified in RFC 5952. Adjust the comments to clearly spell out what type of addresses we expect to see.

jrlevine commented 1 month ago

These regular expressions seem overly strict and don't allow v4 in v6. They won't allow either of these: 2001:0002:1234:: / / leading zeros ::ffff:12.34.56.78 // v4 in v6. not rare in servers that handle both v4 and v6

I'd prefer a pattern more like [:.a-f0-9]+ and let the scripts that process the addresses complain if the addresses are garbled.

abrotman commented 1 month ago

I'd say you can take it to the list if you feel strongly, but I'm inclined to agree with John.

My concern is that someone verbatim uses this regex, or even uses it outside the context of DMARC. If it's overly restrictive, it could cause problems.

d-javu commented 1 month ago

I wrote a (too) long mail to the list, explaining a little. I accept the reluctance to make it stricter than necessary, and changes to allow leading zeroes actually make the regex shorter, and easier to read.

However, I think experience from RFC 7489, where the regex does not allow zero-bit compressed values, shows that we should not worry too much about being too restrictive. Granted, I do not have visibility into all DMARC aggregate reports ever sent, but all reports I've received over the years have the IPv6 address in the canonical format described by RFC 5952.

There are no instances of leading zeroes in the IPv6 addresses
There are no instances of anyone supplying 4in6 or other unroutable IPv6 addresses
There are no instances of using upper case letters in the IPv6 addresses

This change would thus cement the canonical textual representation format specified in RFC 5952's standing as the accepted format for IPv6 addresses in DMARC reports.

alevesely commented 1 month ago

Can regex impose that compressed zeroes must be the longest or first sequence?

abrotman commented 1 month ago

I'll be honest that I dislike that the DMARC drafts are somehow the document of record on this, and I'd be much happier if there were format/regex IETF documents for IPv4/IPv6 addresses. It feels like this should be a relaxed definition, and perhaps work to better define the format/regex should happen elsewhere (and perhaps be added as a reference as an erratum at a later point)

vanderlee commented 1 month ago

... the DMARC drafts are somehow the document of record on this

Strongly agree with this. DMARC is not the owner of IPv6 specification and should not take on any part of that responsibility.

alevesely commented 1 month ago

Me too. Mine was a rhetorical question, since I don't think there is any way a regular expression could match the rules detailed in RFC 5952. So we might as well use the short, one-liner expression.

jrlevine commented 1 month ago

RFC 4291 defines the text format for an IPv6 address. RFC 5952 suggests a subset of that format. In practice, anything that can parse the 5952 subset can also parse any 4291 address, so there is no benefit at all to trying to restrict reports to the subset. As far as I can tell there is no formal definition of IPv4 addresses, but fortunately they're easy to describe in prose, four decimal integers separated by dots, and we can leave the semantics of exactly which integers are valid to the parser.

So can we agree to use my simple regex and close this?

d-javu commented 1 month ago

I wanted to provide a fix to the clearly faulty IPv6 regex we currently have. In the process throwing out ipv4-mapped ipv6 addresses and other globally unroutable IPv6 addresses. Judging from the fact that the RFC 7489 IPv6 regex only matches a tiny fraction of the addresses in the reports I receive, I think no one really uses it currently.

But look, this is getting out of hand. please discuss on the mailing list where I believe the current status is that we should provide a more generic regex and add text to encourage RFC 5952 formatting, just like everyone is doing anyway.

PS: I'd answer @alevesely in the positive. I'm sure it can be done with lookahead, but that is not for the faint at heart and left as an exercise for the reader.

abrotman commented 1 month ago

@d-javu Thanks. John sort of pointed at a place where some of this is noted in a proper fashion, but might be worth following up and seeing if folks would be interested in a more concrete definition of the format and matching regular expressions. I think it could really benefit the larger IETF.

alevesely commented 1 month ago

PS: I'd answer @alevesely in the positive. I'm sure it can be done with lookahead, but that is not for the faint at heart and left as an exercise for the reader.

Hm... matching shorter zero sequences, which can be anywhere after the longest/first one, would explode the number of cases. A really tough exercise. However, XSD Regular Expressions don't seem to allow lookahead.

d-javu commented 1 month ago

@abrotman I'll try to prepare a different pull-request based on this, and the on-list discussion.

Hm... matching shorter zero sequences, which can be anywhere after the longest/first one, would explode the number of cases. A really tough exercise.

Indeed, don't tempt me to try, to find out on way or another :)

However, XSD Regular Expressions don't seem to allow lookahead.

You're right, that is a different kind of animal. A lot of time passes between each time I do anything with XSD regexes, so I tend to forget how neutered they are.

ietf-wg-dmarc / draft-ietf-dmarc-aggregate-reporting

Ipv6 regex update #20