purebred-mua / purebred-email

A fast email parsing library implemented in Haskell
https://hackage.haskell.org/package/purebred-email
GNU Affero General Public License v3.0
23 stars 4 forks source link

email construction, manipulation and printing #2

Closed frasertweedale closed 5 years ago

frasertweedale commented 6 years ago

We have a use case for building emails and serialising them for transmission. Right now purebred is using a different library for that.

Things we need to include or handle:

romanofski commented 6 years ago

Assigned this to myself. As a noob myself on this, I guess I'll use this to document some of my findings and how I'm about to proceed. I guess a) to document for myself the way I approach it and b) a log for myself to learn from possible mistakes which will most likely happen.

romanofski commented 6 years ago

I've got a property test to check that the round trip between parsing/serialisation works. This obviously fails, but I'm now a bit stuck in which way I should go. Most of the types in Data.MIME are folds and getters. Rendering a MIMEMessage would be way too low level, so we need something higher. The tricky bit I'm currently unsure of is: do we change the data type:

data MIME
  = Part B.ByteString
  | Multipart [MIMEMessage]
  deriving (Show, Eq)

or do we change the Getters to... ??. My current thought would be to add a few more fields to the Part constructor (encoding, content type, headers, filename?), yet the parser would need adjustment. That seems to be counter-intuitive as well, since speed would most likely take a hit, for not much gain (we're not constructing new mails, when parsing).

Maybe I need to sleep over it and think ...

frasertweedale commented 6 years ago

@romanofski the MIME data type is not intended to be used on its own, rather as the payload type of Message, i.e. Message MIME. The Message carries all the headers for the MIME payload which may be a single part or multipart.

So yeah, we need to render the MIMEMessage. Why do you suggest that is too low-level?

romanofski commented 6 years ago

AAahh I think I understand. My mistake was to expect the part itself will have to carry headers, which is not the case. For example, if I have a multipart mail, with a Content-Type: multipart/alternative, I would effectively have (hope I get it right, just for illustration):

MIME Headers (Multipart [(MIME Headers (Part <>) -- text/plain,
                          MIME Headers (Part <>) -- text/html])
frasertweedale commented 6 years ago

@romanofski that's pretty much the shape of it.

romanofski commented 6 years ago

I've been thinking and reading up on this a bit. My initial idea was to render whatever the MIMEMessage is composed of. The main motivation is to render the mail similar to the way we parse it.

Then use helper functions for composition: adding parts, attachments, etc. One huge drawback of this however is, that I need to basically pre-render parts of the message during composition. That is slow and cumbersome.

One other idea I had was to overwrite the constructors with newtype in order to add additional information (e.g. attachment or inline, description, filename, yadda yadda), but dismissed the idea.

frasertweedale commented 6 years ago

@romanofski what do you mean by "pre-render parts of the message"? Can you give an example of when/why this would be necessary?

romanofski commented 6 years ago

Lets assume you add an attachment (bear with me, the interface might not be how I would end up implementing it):

addAttachment :: Text -- ^ content type
                         -> FilePath
                         -> MIMEMessage
                         -> IO Mail -- ^ IO for using the random number generator in order to compute boundary

By "pre-rendering" the mail I'm referring to the fact that I will have to construct a Part with a bytestring, which probably comes from a "readFile" invokation. If I attach a bigger file, wouldn't that readFile take some time in order to construct the Part with it's bytestring? Maybe I'm mistaken and the file is only read upon rendering the mail.

frasertweedale commented 6 years ago

@romanofski no, you are correct. But, is this really too high a cost, to read a file of the filesystem? I think probably not. Keep it simple.

But if it turns out to be too expensive, there are a couple of ways we could address it. IMO let's cross that bridge if/when it matters, and just do the simple thing first.

romanofski commented 6 years ago

Alright good to know that I gotten a bit too worried. The benefit of just rendering a MIMEMessage is, that in it's current form it's really simple.

frasertweedale commented 6 years ago

I'd like to talk about how to avoid the IO for MIME boundary randomness. IMO we should seed a RNG at program start time and put it in the AppState. Essentially it can be an infinite list of random, valid characters. When we create the MIME boundary we can just pull n chars of the front of that, set the rest of of the list back into the AppState, and there you go.

But just leave as IO for now I guess, we can refactor later.

romanofski commented 6 years ago

@frasertweedale Yeah I'd like the idea of keeping the IO out of the serialisation.

One other aspect I thought of leaving out for now is handling multipart/alternative. I think for now focusing on multipart/mixed should be sufficient to get us going. Unless I'm missing a very valid use case for alternative representations of the same content, I would leave it for later.

romanofski commented 6 years ago

Hm on a closer look this might be a no-brainer. I'm still figuring out which way to go:

  1. generate the boundary when Multipart is constructed, which is added by a Content-type header. This however has the downside, that I'll somewhat need to "extract" the boundary out of the header value. Not hard, but well... stringly typed
  2. generate the boundary when rendering the mail. That has the benefit of having access to the boundary when the mail is serialised. What about nested MIME messages tho which already have a boundary. They don't need to be re-computed.

Item 2) seems to indicate that I should go with 1). I currently have a form of 2) so lets give 1) a go. Should probably also take into account that the rendering doesn't generate the boundary, but it's passed into the function so we can get rid of the IO interaction.

romanofski commented 6 years ago

I just found the ContentType constructor. Maybe idea 1) isn't as dire in terms of typing than I thought ...

romanofski commented 6 years ago

Alright I have a renderer, which if existing mails are parsed, spits them out identically. I'll see to add the "high level" API and driven by the demand we need for purebred.

frasertweedale commented 6 years ago

On Wed, Aug 15, 2018 at 10:09:16PM -0700, Roman Joost wrote:

Alright I have a renderer, which if existing mails are parsed, spits them out identically. I'll see to add the "high level" API and driven by the demand we need for purebred.

Nice @romanofski. Happy to do a review of what ya got so far, if you want.

romanofski commented 6 years ago

Cheers, but I just realised that I've forgot to properly fold field bodies. I also wanted to add a property test, which at this point is still failing :)

romanofski commented 6 years ago

Bit stuck, shouldn't:

view (headers . contentDisposition . filename) $ Message [("Content-Disposition", "attachment; filename=foo.pdf")] (Part "foo")

return "foo.pdf"? Unless I've stuffed something up, it returns me an empty string or a Nothing if I use preview. I can see that it tries to give me a T.Text which it does through charset decoding? Does that mean for Attachments I always need to specify the encoding?

frasertweedale commented 6 years ago

@romanofski looking into it.

romanofski commented 6 years ago

I figured it out (or rather I think I do). So it tries to charset decode the file name, but for an attachment with a Content-Type: application/whatever there is no charset specified. Which then ends up being a Nothing.

Maybe better to just return a Bytestring ?

frasertweedale commented 6 years ago

No, we should return Text in case it is encoded in some alternative charset. I think it's a regression from when I implemented parameter continuations. I'll have a fix soonish, hopefully.

romanofski commented 6 years ago

No worries. I've written myself a little workaround. Will use that in the mean time and then use the proper fix. Cheers!

frasertweedale commented 6 years ago

@romanofski how does https://github.com/purebred-mua/purebred-email/commit/d12a48ccf8cb4823783f5d0a8651acf45e0c5b03 go for you? If you like it you can push to master.

romanofski commented 6 years ago

Nice. Works like a charm :) Push, push!

frasertweedale commented 6 years ago

@romanofski pushed. I should really write some tests for this basic stuff.

frasertweedale commented 6 years ago

@romanofski FYI, from our discussions yesterday: I'm about 1/2 of the way to having "writeable" optics all the way from message down to decoded (i.e. non-ASCII) filenames. The optics also support deleting. See https://github.com/purebred-mua/purebred-email/blob/c8380f43ed3abf911291837b62cbc118b97afdf9/tests/Headers.hs#L145-L164 for an example of what's possible now.

The stuff that's missing is:

Once that's done, we should be able to read and write deep down into the MIME data via the same optics. For example, drill down into the Content-Disposition filename parameter and drop leading path components ;)

romanofski commented 6 years ago

I'm a bit stuck putting this together. My understanding that we should have everything in order to compose this correctly. In order to "loop" a function over every attachment, I was presuming I could use a traverse:

-- would unset all filenames in the contentdisposition header of attachments
setNewFilename mail = traverseOf (attachments . headers . contentDisposition . filenameParameter) Nothing m

but that gives me:

    No instance for (Contravariant Maybe)
      arising from a use of ‘attachments’

Using a fold:

λ: toListOf (attachments . headers . contentDisposition . filenameParameter) multiPartMail 
[Just (ParameterValue Nothing Nothing "/bar/foo.bin")]

seems to work fine. I'm not sure what's exactly wrong or where my misunderstanding comes from.

The documentation for Traversal mentions that a Fold if f is Applicative and Contravariant. I went back to the attachments function which is a Fold. Is perhaps my example function above a more specific type signature?

Any pointers I could read/check?

frasertweedale commented 6 years ago

@romanofski new commit incoming. Had to change attachments to a Traversal (was a Fold). See tests for example of how to do what you want.

romanofski commented 6 years ago

Hey @frasertweedale cheers for this. The way I read the map on http://hackage.haskell.org/package/lens-4.17 is that a Fold is also a Traversal. Do I take the inheritence here a bit to literally? Does the Fold produce a single value where as I really do need a Traversal here, since I do want to apply a function to all attachments?

frasertweedale commented 6 years ago

@romanofski https://github.com/purebred-mua/purebred-email/commit/482f5f5780880bd708f439859323a3b2fd9f3dc1.

You just use set:

set (attachments . headers . contentDisposition . filenameParameter) Nothing msg

traverseOf is definitely not what you are looking for.

frasertweedale commented 6 years ago

@romanofski a Traversal is a Fold, not the other way (Fold is read-only).

Traversals and folds both work with multiple targets.

set with a traversal sets all targets (per "unset all attachment filenames" example in tests).

You can also apply a function over all targets of a traversal.

In general it is easier to use the view, preview, set, over helper functions than use the optic directly.

romanofski commented 6 years ago

I see. Seems like I did read the map totally wrong. I guess part of the reason I cant use set on a Fold is that its simply not defined. I could use view though which would give me a monadic value. If you use a traversal, you can use set because a traversal is also a Setter. Does that sound right?

romanofski commented 6 years ago

Yep, changing it to a Traversal is all it needs and I get a passing build. I guess one last missing bit is implementing the function which passes the ByteString on to sendmail. I'll add that and see if it makes Travis happy.

romanofski commented 6 years ago

Oh and btw. just in case: I'll wait for your commit, rebase and then push.

frasertweedale commented 6 years ago

@romanofski the commit is https://github.com/purebred-mua/purebred-email/commit/482f5f5780880bd708f439859323a3b2fd9f3dc1

romanofski commented 6 years ago

I've been fiddling with the API a bit more. More specificially I'd like to use the Mailbox data types when constructing a MIMEMessage. What I currently can't decide, but rather try to figure out if having a function which accepts the addresses (e.g. To, From, etc) or use convenient lenses to set the headers. At the moment I tend towards providing lenses to alter the MIMEMessage, which thanks to your At and Ix instances(?) isn't very much.

In case you're opinionated on how the API should look let me know, otherwise I keep playing.

frasertweedale commented 6 years ago

I think we want the optics for sure, and after that you will be able to decide whether a "convenience function" is desirable.

romanofski commented 6 years ago

Hrmpf. I'm implementing the optic for setting the Date field and thought this should work:

headerDate :: Lens' Headers UTCTime

Now I realise that it won't work without involving IO, because the time library uses IO to parse a given string.

I'm now thinking of using:

headerDate :: Lens' Headers ByteString

for now and simply deal with the conversion later. Surely the caller can figure out the current time involving IO by himself?

romanofski commented 6 years ago

Found a way, even though not pretty:

headerDate :: Lens' Headers UTCTime
headerDate = lens getter setter
  where
    getter = parseTimeOrError True defaultTimeLocale dateFormat . C8.unpack . view (header "date")
    setter hdrs x = set (header "date") (C8.pack $ formatTime defaultTimeLocale dateFormat x) hdrs

The parseTimeOrError function throws an exception if it can't parse the date. I'd keep improving this as a backlog item for later.

frasertweedale commented 6 years ago

Roll with it for now, we can refactor later.

romanofski commented 5 years ago

I'll close this because the PR #15 was merged.