thoughtbot / griddler

Simplify receiving email in Rails (deprecated)
http://griddler.io/
MIT License
1.38k stars 199 forks source link

Allow access to sanitized HTML body when a text body is present #243

Closed wingrunr21 closed 4 months ago

wingrunr21 commented 8 years ago

Right now the code defaults to the email text if it is present for the call to body. The only way to get a sanitized copy of the HTML body is to manually call clean_html(email.raw_html) inside of the email processor.

Introduce text_body and html_body attributes on Griddler::Email that provide direct access to the sanitized text/HTML bodies. This also brings the sanitized body attributes inline with those exposed under the raw_ naming.

sfcgeorge commented 8 years ago

This would be really useful. My usecase is receiving email in the richest format possible, then parsing it into Markdown for saving into the DB and then easily rendering the HTML and TXT views. Pseudocode of what I'm doing:

@body = MarkdownSwizzler.new(email.html_body || email.text_body)

# mailer_view.html.erb
<%= @body.to_html %>

# mailer_view.text.erb
<%= @body.to_markdown %>
wingrunr21 commented 8 years ago

I'm planning on doing it this weekend. Got sidetracked last week and didn't get to cut a release. This, #222, and #223 are on my radar.

sfcgeorge commented 8 years ago

Nice. For my case I actually need the HTML left in tact as I use it for formatting (and sanitize to strip unsafe tags) but maybe that goes against the spirit of this project. Here's the patch I wrote anyway for reference:

module Griddler
  class Email
    def html_body
      EmailParser.extract_reply_body(html_or_sanitized_text)
    end

    def clean_raw_html(html)
      cleaned_html = clean_invalid_utf8_bytes(html)
      cleaned_html = HTMLEntities.new.decode(cleaned_html)
      cleaned_html
    end

    def html_or_sanitized_text
      html = clean_raw_html(raw_html)
      html.presence || text_or_sanitized_html
    end
  end
end
wingrunr21 commented 8 years ago

I was planning on leaving the HTML tags in the html_body attribute. Otherwise there isn't much functional difference between that and the plain text.

wingrunr21 commented 8 years ago

This got moved to v1.4.0 as I feel it is enough of an API change to warrant that version.