staktrace / mailparse

Rust library to parse mail files
https://docs.rs/mailparse/
BSD Zero Clause License
182 stars 40 forks source link

Parsed email header with forbidden characters returned without quotes #117

Closed edevil closed 1 year ago

edevil commented 1 year ago

If you use parse_header() on b"From: =?UTF-8?B?THVpcyBHb256w6FsZXogW1ZhdXhvb10=?= <notifications@github.com>", and then obtain the value of said header you obtain:

Luis González [Vauxoo] <notifications@github.com>

However, I think that since the name contains square-brackets it should be enclosed in quotes. mailparse can parse the header back, but some libraries such as https://www.npmjs.com/package/email-addresses complain about this, and it seems they are correct.

staktrace commented 1 year ago

parse_header is a generic header parsing function and not specific to address headers. So it's not going to be modifying the content beyond doing the decoding of (in this case) base64 content. If there's quotes missing here it's because the sender didn't put them in.

staktrace commented 1 year ago

Closing for now but feel free to respond and try to change my mind :)

edevil commented 1 year ago

I'm also unsure if this is something that is a responsibility of this crate.

On one hand =?UTF-8?B?THVpcyBHb256w6FsZXogW1ZhdXhvb10=?= <notifications@github.com> is valid, no quotes are needed, you only need quotes if you decode the string.

On the other hand parse_header() does not guarantee that what I get in return is a valid rfc5322 header, it was me who was assuming it.

So maybe the fix should be elsewhere. :)

Thanks.

wathiede commented 10 months ago

Did you come up with a solution? I just stumbled across this too trying to use addrparse. I think it's not actually parse_header that does the decoding, it is get_value.

So you can potentially use get_value_raw, then call String::from_utf8 on the result, pass that into addrparse then use another library to decode the quoted printable version of the structured output.

That solution isn't great though, as get_value also does whitespace normalization and handles multiline headers. It seems like addrparse (or maybe addrparse_header given it avoids the explicit get_value call) would be more robust if it was possible to perform the tokenization on the endcoded form of the header, and then decode the name portion after tokenization.