mikedilger / email-format

Email data structure and builder for streaming emails
Apache License 2.0
17 stars 5 forks source link

Use semantic chunks in API, not ABNF chunks #15

Open mikedilger opened 7 years ago

mikedilger commented 7 years ago

The ABNF chunks defined in RFC5322 do not match exactly to the semantic chunks. For instance, message-id is defined with angle brackets in the ABNF, but the text of the standard indicates that angle brackets are not part of the semantic meaning of message-id. Therefore, set_message_id() ought to not require angle brackets, but parsing/streaming ought to handle them appropriately.

It gets weirder with whitespace. You might think whitespace following the colon after a header name belongs to the label somehow, but no, it belongs to the next token in the ABNF, defined as allowed via multiple ABNF constructions, the net result is that it just so happens to be allowed after every header name and colon. However, the whitespace is semantically not part of most tokens that allow it. How do we parse out and put in whitespace in a consistent manner, and not require callers to add whitespace to the token when they want a space after the colon (e.g. with set_sender(" a@b"))?

The changes implied here will likely break existing code, and potentially will be very extensive.