ietf-wg-dmarc / draft-ietf-dmarc-aggregate-reporting

3 stars 2 forks source link

Report-ID syntax in subject line #13

Closed jrlevine closed 1 year ago

jrlevine commented 1 year ago

Section 7.2.1.1. says that the Report-ID is a msg-id. Almost nobody does that. Google uses a long decimal number, Outlook uses a hex number, Fastmail sends a datestamp with dots between the numbers, Yahoo sends two dot-separated numbers in angle brackets, Comcast sends a dash separated string including the domain in angle brackets, AWS sends dash-separated hex numbers in {} braces.

Since the only interesting thing about the report-iD is that it is distinct from other report IDs, I suggest loosening the syntax to agree with reality, except perhaps for the AWS angle brackets.

smjones commented 1 year ago

Seems sensible in general. Is the intent to lobby AWS to stop using {}, to omit it as beneath official notice, or to technically "invalidate" their current practice?

jrlevine commented 1 year ago

My inclination is to allow all but the AWS stuff since AWS is screwed up for a lot of other reasons too:

Subject: Dmarc Aggregate Report Domain: {iecc.com} Submitter: {Amazon SES} Date: {2022-09-28} Report-ID: {ab9e4750-9fea-44a2-bc10-0417b9b01808}

Keep in mind that 99% of report-IDs are invalid now, so the question how how much to retroactively approve. I'd say something like '<' ridtxt '>' / ridtext where ridtext is a string of letters, digits, dot, hyphen, and at-sign

alevesely commented 1 year ago

The spec could say that the unique-id MAY be used as msg-id, for the few who do it.

Could we please also remove the id from the Subject:? I don't think anyone goes there when looking for it. It only clutters the message pane.

Best Ale

jrlevine commented 1 year ago

Could you explain what problem this incompatible change would solve? Should we then update our scripts to reject messages with Report-ID in the subject?

alevesely commented 1 year ago

See attached screenshot. The top messages (illegally) lack the Report-ID field, while the bottom ones comply. Readability?

I know readability is a minor issue for messages destined to automated processing. Yet, a cluttered Subject: field is not a reasonable choice for automated processes to look at. It is a displayed field.

alevesely commented 1 year ago

screenshot

jrlevine commented 1 year ago

The point of the Report-Id is to recognize duplicate reports. I am unaware of any processing scripts that understand English or Italian, so I don't see why readability matters. Once again, why do you think it is bad to be able to detect duplicate reports?

alevesely commented 1 year ago

My processing script doesn't feed a database, but limits itself to adding an HTML entity (depicted here) to each message. It understands XML only. Actually, as it treats each message as a self-standing element, it doesn't recognize duplicate elements.

Email clients provide a convenient interface to access messages. I smell occasional duplicated elements by the occurrence of the same domain in quick succession, confirmed by the id, begin, end triple prominently displayed in the message content.

Scripts that aggregate reports from different submitters can skip prescreening and check for duplicates by just comparing the keys. Removing Report-Id:.* from the Subject: only affects nitpicking modules in pedantic mode.

jrlevine commented 1 year ago

I understand you to be saying that you personally don't find report-id to be useful, but other people do and if we removed it, they would have to make changes to adapt. So of course we won't do that, we'll just adjust the ABNF to match the report-ID's that reports actually have.

alevesely commented 1 year ago

I don't think anybody relies on a machine parsable Subject:, but knowing anyone who does would help understand the topic.

Changes are going to be needed to adapt to the new standard anyway. (For example, we removed pct=.) Getting the key from the right place would be a minor hassle. We have the chance to amend an inelegant solution. It contradicts RFC5322, where Subject: is defined as an unstructured field. Most other standards treat that field accordingly. There is a Keywords: field which is undoubtedly better suited for that usage. If we standardize that bad practice now, we'll have to keep it for ever.

abrotman commented 1 year ago

Please move this conversation to the list so that others can participate.  Thanks On Tuesday, October 4, 2022 at 05:54:56 AM EDT, Alessandro Vesely @.***> wrote:

I don't think anybody relies on a machine parsable Subject:, but knowing anyone who does would help understand the topic.

Changes are going to be needed to adapt to the new standard anyway. (For example, we removed pct=.) Getting the key from the right place would be a minor hassle. We have the chance to amend an inelegant solution. It contradicts RFC5322, where Subject: is defined as an unstructured field. Most other standards treat that field accordingly. There is a Keywords: field which is undoubtedly better suited for that usage. If we standardize that bad practice now, we'll have to keep it for ever.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

smjones commented 1 year ago

On Oct 1, 2022, at 10:43 AM, John L

Should we then update our scripts to reject messages with Report-ID in the subject?

Why would the change to “MAY include a Report-ID” mean you have to reject reports with one in the subject? I would think “MAY” would mean you /could not/ reject messages that included a Report-ID in the subject (leaving aside the composition of the Report-ID). Perhaps I trimmed too much context and lost the reason?

More to the point, the reason for the request sounds like an MUA presentation concern about messages intended primarily for automated processing. If having the ID in the subject offers any utility at all for automated processing, then any MUA concerns should be overlooked.

—S.

Sent from my iPhone

abrotman commented 1 year ago

Changed to John's suggestion of "ridtxt / < ridtxt > "