jackbearheart / email-addresses

An RFC 5322 email address parser
MIT License
249 stars 36 forks source link

Should 'foo@bar.com' (apostrophes included!) be a valid email address? #53

Open xemlock opened 4 years ago

xemlock commented 4 years ago

The following script instead of returning null, returns a parsed address, with apostrophes present in local and domain parts:

const addr = require('email-addresses');
console.log(addr.parseOneAddress("'foo@bar.com'"));

Result:

{
  parts: {
    name: null,
    address: {
      name: 'addr-spec',
      tokens: "'foo@bar.com'",
      semantic: "'foo@bar.com'",
      children: [Array]
    },
    local: {
      name: 'local-part',
      tokens: "'foo",
      semantic: "'foo",
      children: [Array]
    },
    domain: {
      name: 'domain',
      tokens: "bar.com'",
      semantic: "bar.com'",
      children: [Array]
    },
    comments: []
  },
  type: 'mailbox',
  name: null,
  address: "'foo@bar.com'",
  local: "'foo",
  domain: "bar.com'",
  comments: '',
  groupName: null
}

Domain name bar.com' is obviously invalid.

Library version: 3.1.0

jackbearheart commented 3 years ago

Thanks for the issue.

In terms of the spec that is implemented here, single quotes / apostrophes ("'") are, surprisingly, explicitly allowed in the local-part. See the definition of "atext" in the code/rfc.

As for the domain, it's loosely defined in the email rfcs, and the spec also uses the same production which explicitly allows single quotes... See the definition of "domain" and "dot-atom" again leading to "atext". (Of course from our practical knowledge this is bogus as a real TLD cannot have a single quote in it.)

I think the nicest fix would be to validate domains as well, as an additional step, using specs beyond what this library currently implements. For anyone running into this issue, yes, this library doesn't do much domain validation.

gene-hightower commented 3 years ago

Domains are much more loosely defined in RFC-5322 than RFC-5321. However a note in RFC-5322 says: “Note: A liberal syntax for the domain portion of addr-spec is given here. However, the domain portion contains addressing information specified by and used in other protocols (e.g., [RFC1034], [RFC1035], [RFC1123], [RFC5321]). It is therefore incumbent upon implementations to conform to the syntax of addresses for the context in which they are used.” RFC-5321 doesn't define a domain as a "dot-atom," rather it says: Domain = sub-domain ("." sub-domain) sub-domain = Let-dig [Ldh-str] Let-dig = ALPHA / DIGIT Ldh-str = ( ALPHA / DIGIT / "-" ) Let-dig It seems perfectly reasonable to me for a practical RFC-5322 parser to use the more restrictive syntax of RFC-5321 for the domain. Further validation based on DNS and other factors could be performed as an additional step, sure.