knrafto / email-header

A Haskell library for parsing and rendering email and MIME headers
Other
2 stars 5 forks source link

How about an actual example of how to use it? #1

Open dhjdhj opened 9 years ago

dhjdhj commented 9 years ago

Would love to use this instead of breaking up headers with regex but I haven't a clue how to figure out HOW to use this library --- how about adding a couple of examples, either as .hs files or even in a readme, to show how to use this library?

knrafto commented 9 years ago

This library is a bit old and I haven't worked on it in a while, but I'm willing to put some time in if it's useful to somebody! I'll add some examples to the README.

dhjdhj commented 9 years ago

Deleted - mistaken post

knrafto commented 9 years ago

What do you mean by 'fetch'?

dhjdhj commented 9 years ago

I just realized I confused your library with the HaskellNet library with which I have been experimenting. I had hoped to use your library to parse the headers but when I couldn't figure out how to use your library I just put together a regex to extract header values.

knrafto commented 9 years ago

Here's a quick example for parsing:

ghci> :set -XOverloadedStrings
ghci> import qualified Network.Email.Header.Read as H
ghci> import Network.Email.Header.Types
ghci> let headers = [("From", "John Doe <john@doe.com>"), ("Subject", "This is a test")] :: Headers
ghci> H.from headers
[Mailbox {displayName = Just "John Doe", mailboxAddress = Address "john@doe.com"}]

The other functions in Network.Email.Header.Read are similar.

dhjdhj commented 9 years ago

Uhmmm, that's not the hard part. You have manually created some tuples with name/value pairs. The thing is, if you use a library like HaskellNet to actually retrieve mail from an IMAP server (say), the fetchHeaders function returns a single long ByteString containing the entire header of the message in one go. That string has to be broken up into all those name/value pairs in the first place. And since extra headers can get added along the way, you can't depend on a predefined set of functions like H.from (for example). So one needs (I suspect) to be able to write code like the following:

 let headers = parseHeader(byteStringHeader) -- convert original string to a dictionary
 print $ let h = headers "From"   --- get the value of the 'from' header
 print $ print $ fromParts  h  -- get the displayName and mailboxAddress

Of course this would also let you iterate over the dictionary to discover all the names as well.

knrafto commented 9 years ago

The type Headers = [(HeaderName, L.ByteString)] is a bit like an ordered multi-dictionary. The RFC specifies that the order of headers matters and that the same header can appear more than once. Prelude.lookup can get the first value of a header, if it exists. You can iterate over it like any list, or do map fst to get all the header names.

H.from looks up the first "From" header and parses it in the way the RFC says it should be structured. Same with its friends. Yeah, the documentation is not exactly clear on that...perhaps you could help?

I agree that something to parse the headers from a single ByteString is needed.

dhjdhj commented 9 years ago

Yeah, when I have dealt with mail headers in the past with other languages, I have typically made things like "Received-From:" which can occur multiple times into their own array and supply an array length so one can iterate inside all identically named headers.

I understand what H.from does. But the thing is, without something to parse all the headers, I don't see how the rest of the library is useful --- there has to be a way to build the dictionary.

I just found about a MIME package, http://hackage.haskell.org/package/mime-0.4.0.1 , that may be helpful.

knrafto commented 9 years ago

This originally was going to be part of a complete email-parsing library, but I don't really have time to work on that or this library any more. It would parse the headers and use this library to figure out how to parse the body in a streaming manner. It shouldn't be too hard to build an attoparsec parser for the headers, though, or look at the code for the many HTTP libraries.