Closed wingrunr21 closed 4 months ago
This would be really useful. My usecase is receiving email in the richest format possible, then parsing it into Markdown for saving into the DB and then easily rendering the HTML and TXT views. Pseudocode of what I'm doing:
@body = MarkdownSwizzler.new(email.html_body || email.text_body)
# mailer_view.html.erb
<%= @body.to_html %>
# mailer_view.text.erb
<%= @body.to_markdown %>
I'm planning on doing it this weekend. Got sidetracked last week and didn't get to cut a release. This, #222, and #223 are on my radar.
Nice. For my case I actually need the HTML left in tact as I use it for formatting (and sanitize to strip unsafe tags) but maybe that goes against the spirit of this project. Here's the patch I wrote anyway for reference:
module Griddler
class Email
def html_body
EmailParser.extract_reply_body(html_or_sanitized_text)
end
def clean_raw_html(html)
cleaned_html = clean_invalid_utf8_bytes(html)
cleaned_html = HTMLEntities.new.decode(cleaned_html)
cleaned_html
end
def html_or_sanitized_text
html = clean_raw_html(raw_html)
html.presence || text_or_sanitized_html
end
end
end
I was planning on leaving the HTML tags in the html_body
attribute. Otherwise there isn't much functional difference between that and the plain text.
This got moved to v1.4.0 as I feel it is enough of an API change to warrant that version.
Right now the code defaults to the email text if it is present for the call to
body
. The only way to get a sanitized copy of the HTML body is to manually callclean_html(email.raw_html)
inside of the email processor.Introduce
text_body
andhtml_body
attributes onGriddler::Email
that provide direct access to the sanitized text/HTML bodies. This also brings the sanitized body attributes inline with those exposed under theraw_
naming.