stalwartlabs / mail-parser

Fast and robust e-mail parsing library for Rust
https://docs.rs/mail-parser/
Apache License 2.0
289 stars 39 forks source link

Parsing multiline headers #58

Closed alexanderbluhm closed 1 year ago

alexanderbluhm commented 1 year ago

Hi, first of all thanks for this nice library. I tested some others, but this one provides a lot of things I need. However, there might be an issue with multiline headers. In my mail, I have an attachment that looks like this (sent from iOS Mail App):

--Apple-Mail-97D39683
Content-Type: audio/x-m4a;
name="My Recording.m4a";
x-apple-part-url=B074DA16-AEC2-4B5E-BF19-20B188C3FFF1
Content-Disposition: attachment;
filename="My Recording.m4a"
Content-Transfer-Encoding: base64

However, when I print the parsed message (println!("{:?}", message);, I am not able to see the filename anywhere. I also tried to loop over the attachments and use attachment.attachment_name(). I am migrating from a Python project and there it was no problem parsing the message correctly, so it might be an issue with mail-parser itself.

I used this code to test but I am not able to get the file name from the message object.

Test Code ```rs let input = br#"From: Art Vandelay (Vandelay Industries) To: "Colleagues": "James Smythe" ; Friends: jane@example.com, =?UTF-8?Q?John_Sm=C3=AEth?= ; Date: Sat, 20 Nov 2021 14:22:01 -0800 Subject: Why not both importing AND exporting? =?utf-8?b?4pi6?= Content-Type: multipart/mixed; boundary="festivus"; --festivus Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: base64 PGh0bWw+PHA+SSB3YXMgdGhpbmtpbmcgYWJvdXQgcXVpdHRpbmcgdGhlICZsZHF1bztle HBvcnRpbmcmcmRxdW87IHRvIGZvY3VzIGp1c3Qgb24gdGhlICZsZHF1bztpbXBvcnRpbm cmcmRxdW87LDwvcD48cD5idXQgdGhlbiBJIHRob3VnaHQsIHdoeSBub3QgZG8gYm90aD8 gJiN4MjYzQTs8L3A+PC9odG1sPg== --festivus Content-Type: message/rfc822; name="Exporting my book about coffee tables.eml" Content-Disposition: inline; filename="Exporting my book about coffee tables.eml" Content-Transfer-Encoding: 7bit From: "Cosmo Kramer" Subject: Exporting my book about coffee tables Content-Type: multipart/mixed; boundary="giddyup"; --giddyup Content-Type: audio/x-m4a; name="My Recording.m4a"; x-apple-part-url=B074DA16-AEC2-4B5E-BF19-20B188C3FFF1 Content-Disposition: attachment; filename="My Recording.m4a" Content-Transfer-Encoding: base64 =FF=FE=0C!5=D8"=DD5=D8)=DD5=D8-=DD =005=D8*=DD5=D8"=DD =005=D8"= =DD5=D85=DD5=D8-=DD5=D8,=DD5=D8/=DD5=D81=DD =005=D8*=DD5=D86=DD = =005=D8=1F=DD5=D8,=DD5=D8,=DD5=D8(=DD =005=D8-=DD5=D8)=DD5=D8"= =DD5=D8=1E=DD5=D80=DD5=D8"=DD!=00 --giddyup Content-Type: image/gif; name*1="about "; name*0="Book "; name*2*=utf-8''%e2%98%95 tables.gif Content-Transfer-Encoding: Base64 Content-Disposition: attachment R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7 --giddyup-- --festivus-- "#; let message = Message::parse(input); println!("{:?}", message); ```

Would be great if this could be resolved :)

mdecimus commented 1 year ago

In my mail, I have an attachment that looks like this (sent from iOS Mail App):

Your sample message is missing a space or tab character on the line wraps, it should be like this:

--Apple-Mail-97D39683
Content-Type: audio/x-m4a;
   name="My Recording.m4a";
   x-apple-part-url=B074DA16-AEC2-4B5E-BF19-20B188C3FFF1
Content-Disposition: attachment;
   filename="My Recording.m4a"
Content-Transfer-Encoding: base64

The attachment_name() method should be called on the MimePart that contains the attachment, it won't work when called from Message (unless the message consists of just one attachment and no body).

alexanderbluhm commented 1 year ago

@mdecimus Thanks for the quick answer! I am doing this, similar to your example:

for attachment in message.attachments() {
    if !attachment.is_message() {
        let filename = attachment.attachment_name().unwrap_or("Untitled");
        let contents = attachment.contents();

But the filename is "Untitled", so the fallback.

mdecimus commented 1 year ago

Try iterating &message.parts.

alexanderbluhm commented 1 year ago

Hi @mdecimus I tried this:

for part in &message.parts {
    match part.attachment_name() {
        Some(name) => {
            println!("Disposition: {:?}", name);
        }
        None => {
            println!("No name");
        }
    }
}

but all I get is "No name" four times. I also tried to access content_disposition() but also there I can't get the filename. Any idea? 🤔

mdecimus commented 1 year ago

Can you include the original message? You can remove any sensitive information, I'd just like to see the MIME headers.

alexanderbluhm commented 1 year ago

@mdecimus Thanks for your help. I exported the mail body using

match connection.retrieve(info.message_id, &mut buffer) {
    Ok(_) => {
        let text = std::str::from_utf8(&buffer).unwrap();
        fs::write("mail.txt", text).expect("Unable to write file");

Here is the mail.txt export of the mail.

mdecimus commented 1 year ago

The problem is that spaces are missing on the multiline headers:

--Apple-Mail-97D39683-EA28-4F27-BCF8-B7C99849500A
Content-Type: audio/x-m4a;
name="Neue Aufnahme 7.m4a";
x-apple-part-url=B074DA16-AEC2-4B5E-BF19-20B188C3FFF1
Content-Disposition: attachment;
filename="Neue Aufnahme 7.m4a"
Content-Transfer-Encoding: base64

Add the missing spaces and it should work:

--Apple-Mail-97D39683-EA28-4F27-BCF8-B7C99849500A
Content-Type: audio/x-m4a;
  name="Neue Aufnahme 7.m4a";
  x-apple-part-url=B074DA16-AEC2-4B5E-BF19-20B188C3FFF1
Content-Disposition: attachment;
  filename="Neue Aufnahme 7.m4a"
Content-Transfer-Encoding: base64
alexanderbluhm commented 1 year ago

@mdecimus Thanks. The data is coming from rust_pop3_client, could it be that the lib messes up the spaces? Anyway I will check if the error occurs using another lib.

mdecimus commented 1 year ago

It could be, something is breaking that message. Spaces are required on multi line headers otherwise they are parsed as a new header.

alexanderbluhm commented 1 year ago

Used another lib and it's working now 🎉