haraka / node-address-rfc2821

RFC2821 Email Address parser (from Haraka)
https://www.npmjs.com/package/address-rfc2821
13 stars 6 forks source link

Address parser not RFC 5321 compliant #27

Closed gene-hightower closed 3 years ago

gene-hightower commented 4 years ago

system info

Linux (Fedora 31)

Haraka from git repo, head.

Expected behaviour

Tried a few addresses from the Wikipedia page https://en.wikipedia.org/wiki/Email_address#Examples

Observed behaviour

Tried some addresses from the "valid" list that failed, specifically quoted strings in the local part seem to be a problem area.

$ telnet localhost 2225 Trying ::1... Connected to localhost. Escape character is '^]'. 220 digilicious.com ESMTP Haraka/2.8.25 ready EHLO digilicious.com 250-digilicious.com Hello localhost.localdomain [::1]Haraka is at your service. 250-PIPELINING 250-8BITMIME 250-SMTPUTF8 250 SIZE 0 MAIL FROM:<" "@DiGiLiCious.com> 501-Command parsing failed 501 Error: Invalid local part in address: " "@DiGiLiCious.com

Also, some of the invalid address seem to go just fine:

MAIL FROM:just"not"right@DiGiLiCious.com 250 sender "just\"not\"right"@DiGiLiCious.com OK

Should have failed, I think.

Also:

MAIL FROM:<this is"not\allowed@DiGiLiCious.com> 250 sender <"this\ is\"not\allowed"@DiGiLiCious.com> OK

...and other, I think you get the idea.

Other areas that seemed to be a problem where UTF-8 in the domain part of an address. With SMTPUTF8 support that should work.

Steps to reproduce

Just telnet to Haraka and speak SMTP.

I guess my question is, are you guys open to using a parser that follows the RFC-5321 mailbox syntax?

msimerson commented 4 years ago

I guess my question is, are you guys open to using a parser that follows the RFC-5321 mailbox syntax?

Yes! We are. Currently we extracted the parser out of Haraka to address-rfc2821. We've long had intentions of updating the parser for full 5321 compliance.

gene-hightower commented 4 years ago

I will take a shot at creating a parser for RFC-5321 syntax. Plan to try https://pegjs.org/ to generate from PEG.

baudehlo commented 4 years ago

I highly recommend taking a look at https://github.com/jackbearheart/email-addresses/blob/master/lib/email-addresses.js

it uses a hand made parser, but it follows the grammar rules in structure.

On Tue, Aug 11, 2020 at 12:09 AM Gene Hightower notifications@github.com wrote:

I will take a shot at creating a parser for RFC-5321 syntax. Plan to try https://pegjs.org/ to generate from PEG.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/haraka/Haraka/issues/2833#issuecomment-671712789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFBWYZ5RTXXOTWOZFV2EZTSADABFANCNFSM4P2NNM6Q .

gene-hightower commented 3 years ago

I have a parser built using https://nearley.js.org/ which can be found at https://github.com/gene-hightower/address-parse-21. I have snarfed the tests from https://github.com/haraka/node-address-rfc2821 and made a few, what I think are, corrections. Specifically, embedded spaces in the local-part aren't valid. @msimerson does this seem like a step in the right direction?

msimerson commented 3 years ago

@baudehlo is the author of this module and also has recent experience with writing nearley parsers so I'm going to defer to him.

msimerson commented 3 years ago

PS: Yes, it does very much look like a big step forward. Because there are a bunch of packages that depend on this package by name, the path forward would be opening a PR against this repo that incorporates your changes.

gene-hightower commented 3 years ago

I highly recommend taking a look at https://github.com/jackbearheart/email-addresses/blob/master/lib/email-addresses.js - it uses a hand made parser, but it follows the grammar rules in structure. This is based on the grammar from RFC-5322 the message format, not RFC-5321 the SMTP protocol standard.