owainlewis / clojure-mail

A Clojure library for parsing, downloading and reading email from IMAP servers.
202 stars 54 forks source link

More docs updates #10

Closed mathias closed 10 years ago

mathias commented 10 years ago

Just some quick notes that I took while getting up to speed with the new version.

Related: I can call

(org.jsoup.Jsoup/parse "foo" "UTF-8")

just fine in a REPL in the project, but I get

(html->text (:body (first (inbox 1)) ))

IllegalArgumentException No matching method found: parse  clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:80)

when trying to use html->text. Any clues as to how I can get around this / fix this? Thanks in advance!

owainlewis commented 10 years ago

Thanks Mathias. I'll have a look at the HTML parser issue. my first guess is that whatever gets returned isn't a string perhaps. The doc updates look good so I'll merge them now.

owainlewis commented 10 years ago

I think the html->text function is working but the message body being returned previously was a bit confusing. Now if the message is multi-part (i.e html and text parts) a vector is returned with each part i.e

[{:content-type "blah" :body "foo"} {:content-type "text/html" :body "bar"}]

So basically the body is a sequence rather than the text you might expect for multipart mesasges. The html->text will only work on a html string.

here is a working example using one of the test fixtures in /test/fixtures/25

(def message (read-mail-from-file "test/clojure_mail/fixtures/25"))
(def html-part (second (message-body message)))
(def result (html->text (:body html-part)))

;; => 

"Request to share You are the owner of ContractsBuilder. 
niuserre@gmail.com has asked that you share this item with: 
niuserre@gmail.com You can add these people in 
Sharing settings. Google Docs makes it easy to create, 
store and share online documents, spreadsheets 
and presentations."
owainlewis commented 10 years ago

To solve the problem in your example you would need to dig deeper into the body to get the actual HTML so something like


;; a vector containing [{:content-type "text/html" :body "foobar"}]

(def message-body-parts (:body (first inbox "user@gmail.com" "password" 1)))

(def html-part (second message-body-parts))

(html->text (:body (html-part))

Obviously that's a little confusing so would be happy to look at some way to improve that. : )