stalwartlabs / mail-parser

Fast and robust e-mail parsing library for Rust
https://docs.rs/mail-parser/
Apache License 2.0
298 stars 40 forks source link

Parsing stand-alone email addresses (RFC 5322 addr-spec) #40

Closed dmontagu closed 1 year ago

dmontagu commented 1 year ago

Hello!

We are potentially interested in using this crate to perform email addresses validation in the next version of Pydantic.

(For context, Pydantic is a data validation library for python, and for performance reasons we are rebuilding the core validation logic for the next version in Rust and using PyO3 to expose python bindings.)

@alonme did some initial investigation into using the mail-parser crate for validating email addresses in https://github.com/pydantic/pydantic-core/pull/397 and noted that although MessageStream::new(input).parse_address() works to parse a "name-addr" (terminology from RFC 5322), it doesn't work to parse an "addr-spec" directly:

use mail_parser::parsers::MessageStream;
fn main() {
    let input = br#"Art Vandelay <art@vandelay.com>"#; // name-addr
    let input2 = br#"art@vandelay.com"#;                // addr-spec

    let addr = MessageStream::new(input).parse_address();
    let addr2 = MessageStream::new(input2).parse_address();

    println!("{:?}", addr); // Address(Addr { name: Some("Art Vandelay"), address: Some("art@vandelay.com") })
    println!("{:?}", addr2); // Empty
}

I did confirm that just wrapping an input string in angle brackets gets the address out:

    let input3 = br#"<art@vandelay.com>"#;
    let addr3 = MessageStream::new(input3).parse_address();
    println!("{:?}", addr3); // Address(Addr { name: None, address: Some("art@vandelay.com") })

but I was wondering if you had any suggestions of a better way to specifically attempt to parse a string as an RFC-5322 addr-spec using this crate.

(@alonme also noticed that it seems to allow non-"addr-spec" items; Art Vandelay <123artvandelaycom> produces address: Some("123artvandelaycom") — so maybe this crate won't work for this purpose anyway.)

Any insight very much appreciated!

(@samuelcolvin for context)

alonme commented 1 year ago

Hey @dmontagu, i think you might have some issue in your second example, you are referring to addr3 which isn't defined 😄

dmontagu commented 1 year ago

@alonme thanks for pointing that out, I believe I have fixed the contents above

mdecimus commented 1 year ago

Hi @dmontagu

Just to clarify, when you say that you want to validate an addr-spec, do you need the library to check that it is a valid e-mail address or just to parse it? Currently mail-parser does not perform any kind of validation on addr-spec as there are many cases of MUAs that generate invalid recipient lists (according to RFC5322) and the goal of the library is to retrieve as much information from the message as possible. There is a sibling crate smtp-proto that validates addresses according to RFC5321 and could be used along with mail-parser to validate the parsed contents.

However, if you just need to parse an address list then you don't need any additional crates, just make sure that you add a \n or \r\n to the end of the address list you want to parse. This is because the header parsing functions were designed to be used from inside the library directly on the message header lines.

Hope it helps!

PD: Sorry for the lack of links/formatting, I'm replying from mobile.

samuelcolvin commented 1 year ago

Thanks @mdecimus, sounds like we need another crate. Do you happen to have any recommendations for email address parsing/validation?

mdecimus commented 1 year ago

Thanks @mdecimus, sounds like we need another crate. Do you happen to have any recommendations for email address parsing/validation?

Not sure if there is a crate that only validates e-mail addresses but you may use this function from the smtp-proto crate:

https://github.com/stalwartlabs/smtp-proto/blob/main/src/request/parser.rs#L401