Issues encountered while testing out MIME message creation

shelakel commented 10 years ago

Hi jstedfast,

I'm looking at using MimeKit for raw message creation for use for AWS SES SendRaw and similar API calls that allow sending 'raw e-mail' from multiple senders (SES, Mandrill, Sendgrid, Mailgun etc.).

Having read through most of the code, I must say the code is beautiful.

Here's some of the issues I've encountered constructing raw messages:

1. HeaderList - WriteTo

(https://github.com/jstedfast/MimeKit/blob/master/MimeKit/HeaderList.cs)

According to the RFC, a header value should end with \r\n. Currently the HeaderList appends all headers on one line.

2. InternetAddressList tests

The two following tests fail due to the escaping of the expected output, which can be fixed by folding with \r\n:

TestEncodingMailboxWithArabicName

=?utf-8?b?2YfZhCDYqtiq2YPZhNmFINin2YTZhNi62Kk=?=\r\n =?utf-8?b?INin2YTYpdmG2KzZhNmK2LLZitip?=\n =?utf-8?b?IC/Yp9mE2LnYsdio2YrYqdif?= do.you.speak@arabic.com

TestEncodingMailboxWithJapaneseName

=?utf-8?b?54uC44Gj44Gf44GT44Gu5LiW44Gn54uC44GG44Gq44KJ5rCX?=\r\n =?utf-8?b?44Gv56K644GL44Gg44CC?= famous@quotes.ja

3. Base64Encoder is failing due to not large enough buffer allocated

At Multipart - GenerateBoundary (at Base64Encoder - Flush -> ValidateArguments); 32 bits is allocated to the output buffer but the estimated buffer length required is 73 bytes. I take it the 73 bytes is case specific, but still requires handling.

4. ContentObject - ContentEncoding

It doesn't make sense to specify the Content Transfer Encoding on the MimePart (as it applies to the content/body) as well as on the ContentObject (as ContentEncoding). It'd make more sense if the ContentObject contained the content and the ContentType as to supply parameters such as charset for text/*. Perhaps I'm wrong, but thought it worth mentioning.

5. Enable NuGet package restore

It would be nice if you could enable NuGet package restore so the necessary dependencies (ea. NUnit for the tests project) can be downloaded when building for the first time.

Overall it's looking great. I'll keep you updated with any issues I find.

Kind regards, Shelakel

jstedfast commented 10 years ago

Hi,

Thanks for the compliment on my code ;-)

This isn't the first MIME library I've written (it's my 4th), so I've got a bit of experience with how to design & implement one, so that helps...

My first MIME library was pretty horrendous and was quickly killed off.

HeaderList.WriteTo()

The RFCs are only referring to what the end-of-line sequences should be when transmitting the document over the wire, not when storing locally. When storing locally, ideally what you want to do is use the native end-of-line sequences unless you parsed them from a file that uses some other sequence. This gets rather complicated, though, especially if the end-of-line sequences are mixed throughout the file you parsed.

In any event, yea, this is a known issue - the HeaderList code isn't yet finished. I started to make various bits of code use Environment.NewLine but haven't gotten to all of the places yet.

Now that Parser.cs is parsing simple MIME message documents, I'll probably start trying to bang the other pieces into better shape.

So far, most of the code is just a "rough sketch" of how I want the APIs to work and, with any luck, not too far from a working implementation - but as you've discovered, it's got some rough/incomplete edges.
I thought I ran the unit tests on Windows after fixing the end-of-line sequences to use Environment.NewLine and it worked, but apparently not. Should be a fairly trivial fix.
Yea, the code to generate random boundaries isn't tested at all yet. The 32 byte buffer was meant to be filled with random bytes and then base64 encoded (into ~73 bytes) but I probably used the wrong buffer. This is just one of those mind-dumps I mentioned earlier that I haven't yet had time to iron out.
Well, the idea of having the ContentEncoding on the MimePart and the ContentObject is that the the value on the ContentObject is what the content is currently encoded in, while the value on the MimePart is what you want it to be when you write it out.

When you construct a new message, the ContentObject will likely be set to ContentEncoding.Default (i.e. "None" - perhaps I should even change the enum to None). Meanwhile, the MimePart's encoding would be set to whatever you want it to be when you write it to disk or wherever, so probably ContentEncoding.Base64 or ContentEncoding.QuotedPrintable.

However, when parsing the message from a stream, the parser doesn't actually decode the content in memory, it leaves it in its encoded state, so it has to be able to set that the ContentObject's encoding is Base64.

The reason for this is that I'm trying to make it such that re-saving the message to disk will (hopefully) save out a byte-for-byte exact duplicate of the text that was parsed from the stream.

If I don't do it this way, then I would have to hope and pray that the original message used the same base64 line length that MimeKit uses (some clients fold base64 lines at 60, some fold at 72 and others at 76).

I'm also planning on changing the type of Content from byte[] to Stream. This would allow the parser to keep the content on disk without loading it into memory, thus reducing memory consumption for large emails.

You'll notice that in the Streams folder, I have a BoundedStream class - this is what that stream class was designed for.

I just haven't gotten around to having the parser use that yet. For my first pass at writing the parser, I wanted to keep it simple and use byte[]'s to make debugging easier.
Yea, I'll probably do this. Have any docs on where to get started? I've never written a NuGet package before.

Thanks,

Jeff

jstedfast commented 10 years ago

I've just implemented header folding (which also ensures that each header value ends with Environment.NewLine).

This means that HeaderList.WriteTo() should theoretically work now, although I haven't tested it yet.

The Multipart.GenerateBoundary() code should also be fixed now, but again, not tested... I was too focused on getting the Header folding code implemented.

I'll try to get to the rest of the stuff over the course of the week.

shelakel commented 10 years ago

Good stuff. I'll take a look at the changes.

Have you considered a façade-intermediate format approach; where:

Façade: High level object to construct MIME messages (similar to .NET MailMessage)
Intermediate Format: Low level object constructed by either the Façade or via parsing

The idea being that the Façade supports a set of features (similar to MailMessage), such as an address in 'To' gets written to the Intermediate format 'To' and 'X-Recipient' headers. Another example being the MailMessage Priority which gets written as X-Priority, Priority header fields.

The intermediate format builder would require a façade and an options object to construct the intermediate format. This is to determine the best way of transfer encoding data, ex. 8BIT, Binary, with default being 7BIT, Quoted-Printable and Base64 (all being 7BIT).

I think it's generally a bad idea to use Environment.NewLine for EOL on WriteTo as it differs per platform and thus using WriteTo would generate invalid 'raw' MIME messages if used for sending. Seeing as the entire MIME message would get read into memory anyway; wouldn't it be easier to just create a 'Dump' function to write the raw message to a file, instead of parsing the message and re-encoding it?

Ideally the parser would unfold and decode the raw data to an intermediate format for processing.

I also agree with your approach on switching the Content byte[] to a Stream - the MailMessage object also uses this approach when attaching files. I believe Outlook uses a .msg format for storing messages on disk/archive that is memory mapped to allow the reading of the message without touching the attachments. .NET 4.5 has a MemoryMappedFile class: http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile.aspx - but the MIME message format isn't exactly memory mapped.

If you want to create a NuGet package, you should read this: http://docs.nuget.org/docs/creating-packages/creating-and-publishing-a-package

I was actually referring to the 'Enable NuGet Package Restore' option available with NuGet 2, you can read more here: http://docs.nuget.org/docs/workflows/using-nuget-without-committing-packages

Thanks, I'll keep an eye on this repository.

Kind regards, Shelakel

jstedfast commented 10 years ago

I'm not sure that it is worth implementing a facade approach since developers that want that will probably just use System.Net.Mail, right?

As far as serializing messages (and entities) to a stream goes, I was considering having WriteTo() take an "constraints" argument that would help the mime part figure out how it should encode itself.

shelakel commented 10 years ago

Hi Jeff, I understand your reasoning behind the assumption that most developers will still use System.Net.Mail. Unfortunately, System.Net.Mail suffers from issues concerning proper encoding (Q,B,Encoded Word, Quoted-Printable, Base64).

Here are some references:

< .NET 4.5 Issue: Attachments with long non-ascii names get encoded improperly and causes the attachment to become 'corrupt' http://social.msdn.microsoft.com/Forums/en-US/67414337-19a6-4128-b1a0-212404cc2cb1/bug-in-systemnetmail-net-4

Body encoding issues http://stackoverflow.com/questions/16255487/encoding-to-utf-8-in-email

Mail header encoding issues http://stackoverflow.com/questions/3396289/mail-header-from-sometimes-not-encoded-with-system-net-mail-and-net-framework

Folding issues http://stackoverflow.com/questions/9921182/cyrillic-subject-encoding-in-c-sharp-mails

I first figured that something was not 'right' when I was generating the MIME message for the e-mail to string and saw that UTF-8 characters were allowed in the subject line. I believe it might be something to do with the parameters being passed when creating the MailWriter to generate the message.

The method I used is the second answer in this post: http://stackoverflow.com/questions/7515457/convert-mailmessage-to-raw-text

The source for the MailWriter is here: http://www.dotnetframework.org/Search.aspx?search=MailWriter

It doesn't look like the source for the .NET 4.5 version is available. The biggest issue using the MailWriter is that it's not public and has to be instantiated via reflection - it is an internal implementation that differs per .NET version (hence API is unreliable).

Given this information; I believe there's a demand for generating e-mail MIME messages not dependent on what is supplied in the .NET Framework. Sending messages via SMTP must then also be done via an 'external' client that allows sending raw messages (or via an api that allows sending raw messages; such as Amazon SES/Mailgun/Mandrill).

I'd hold off for now on going a strategy approach on the WriteTo - ideally you want to give the option to encode body content as 'Auto','7BIT','Quoted-Printable','Base64' and given some capabilities (from EHLO) to alternatively encode to UTF-8 or Binary. The 'Auto' would be then need to be evaluated at time of sending depending on the capabilities of the SMTP server, but it should be safe to assume Base64 as a fallback.

Once MimeKit becomes stable; I'd love to port it to Go as there's a serious deficiency of 'full' MIME support (the Go smtp send function requires that you pass the already encoded message as a stream).

Thanks for the great work.

Kind regards, Shelakel

jstedfast commented 10 years ago

I didn't realize that System.Net.Mail was so utterly broken.

I was asked a few years ago to provide advice/suggestions on how to implement Mono's System.Net.Mail so I just took a look at that code and it appears those guys didn't take any of my advice :-(

I might need to rewrite System.Net.Mail using my MimeKit code and contribute that to Mono.

Anyway, the constraint method I was talking about is something I've used before in past mail libraries I've written:

https://git.gnome.org/browse/gmime/tree/gmime/gmime-encodings.h#n54 https://git.gnome.org/browse/gmime/tree/gmime/gmime-filter-best.c#n282 https://github.com/jstedfast/spruce/blob/master/spruce/providers/smtp/spruce-smtp-transport.c#L1234

spruce is my c library for managing mail (mbox, maildir, smtp, pop3, imap4, sendmail) - I gave up on it because it reached 80,000 lines of code (gmime is another 60,000) which is why I more-or-less gave up on it (c is way too verbose). It's actually why I moved it to github a year or so ago instead of developing it more-or-less on a hidden svn server, but I never bothered to try and drum up excitement about it so no one probably even knows it exists. It's also why I started MimeKit, because C# is generally a lot less verbose and not having to worry about ref counting and having access to async/await looked very promising.

I also wrote a mail client (actually, a full Outlook clone) a decade ago, so I have a lot of experience with this sort of stuff ;-)

shelakel commented 10 years ago

I might need to rewrite System.Net.Mail using my MimeKit code and contribute that to Mono.

That would be time well spent. I would also like to see MimeKit as a stand-alone for .NET runtimes :)

I also wrote a mail client (actually, a full Outlook clone) a decade ago, so I have a lot of experience with this sort of stuff ;-)

Sounds like you like e-mail!

I'm looking at porting premailer (ruby) to .NET for HTML message preparation. It does CSS inlining (evaluating CSS rulesets, convert rules to attributes where appropriate for HTML e-mails and converts the rest to inline styles), converts relative to absolute paths and can also generate plain text versions of the HTML message. It will form part of another bigger project I'm planning - which would also require a separate MIME message library - which is why I am looking at MimeKit :)

jstedfast commented 10 years ago

Just curious, but how did you find MimeKit? A random search for "mime parser" on github?

jstedfast commented 10 years ago

I wouldn't say that I like e-mail, it's more that broken mime parsers are a pet peeve of mine. It drives me bat-shit insane to see all these c# mime parsers out there that take System.Strings as input instead of byte[] and that tend to use string.Split() to tokenize email addresses, parameters, etc. It makes me want to cry.

shelakel commented 10 years ago

I believe I came across your blog post Why decoding RFC2047 encoded headers is hard that in turn lead me to MimeKit.

I was reading up on the different SMTP/MIME RFC's in order to implement MIME message generation for Go as the standard library is severely lacking at the moment. MimeKit in its current state (due to little reliance on the framework/external libraries) is a near perfect prospect for porting to other languages.

I'm still a bit on the fence on whether to implement the bigger project in Go, C# (mono compatible) or JavaScript as portability is a concern. If you're interested in hearing about the bigger project; I can write a 'pitch' for you via e-mail - at the moment I'm not planning to make it FOSS and am considering 'per server' licensing. Writing something like this on a per project basis (where for almost all websites this is useful/necessary) is easily $2000+ cost to client to get a semi-decent prototype out.

jstedfast commented 10 years ago

Ah, yea, that blog post was a bit of a rant. Started off as a comment on github trying to help out the author of a lua-based email client as he was running into issues using libmimetic not decoding rfc2047 encoded-word tokens correctly (or at all). That kinda re-started my drive to hack on MimeKit.

shelakel commented 10 years ago

It also (grinds my gears)[http://i.qkme.me/3vreuv.jpg] when specs (such as RFC's) are available, yet the implementations are broken or not implemented at all. I have a similar obsession with electronic communication implementations (e-mail/sms) where I just want to get to a point where I can say to myself "It's done, now leave it alone and go do something else".

jstedfast / MimeKit

Issues encountered while testing out MIME message creation #2