rennat / pynliner

Python CSS-to-inline-styles conversion tool for HTML using BeautifulSoup and cssutils
http://pythonhosted.org/pynliner/
180 stars 93 forks source link

aggressively converts html entities, not really usable for email templates #22

Open jvanasco opened 11 years ago

jvanasco commented 11 years ago

First off, thanks. This is a great package + well designed & documented.

I ran into some issues while trying to get this to work with mako templates (which i use to generate email).

mako templates don't work because the BeautifulSoup parser and formatter will kill things like:

<% value = 1 + 1 %>

and

% if loop.index < 6: 
% endif

becomes

% if loop.index &lt; 6: 
% endif

I tried forking and running on BS4, which allows for a 'formatter' option to be used on printing ( overriding your _get_output function ). I couldn't find a formatter that creates reliable or desirable results.

i think the design of this package will make it hard to work on templates for email , so i wanted to suggest a warning in the documents. If I have time, I might try and get this to work later in the week -- but I'm sadly not hopeful.

Anyways, great package.

rennat commented 11 years ago

Thanks for the input! I haven't tried to use this on any templating languages directly. I'm not opposed to supporting template languages especially if it just means passing the correct beautiful soup arguments.

Pynliner was first used on a django project where we used templates to render the emails to HTML then ran pynliner on them before sending the emails out. I think this approach eliminates the need for running pynliner on templates directly but I'm sure there are valid scenarios for it that I haven't thought of. On Apr 29, 2013 12:05 PM, "jvanasco" notifications@github.com wrote:

First off, thanks. This is a great package + well designed & documented.

I ran into some issues while trying to get this to work with mako templates (which i use to generate email).

mako templates don't work because the BeautifulSoup parser and formatter will kill things like:

<% value = 1 + 1 %>

and

% if loop.index < 6: % endif

becomes

% if loop.index < 6: % endif

I tried forking and running on BS4, which allows for a 'formatter' option to be used on printing ( overriding your _get_output function ). I couldn't find a formatter that creates reliable or desirable results.

i think the design of this package will make it hard to work on templates for email , so i wanted to suggest a warning in the documents. If I have time, I might try and get this to work later in the week -- but I'm sadly not hopeful.

Anyways, great package.

— Reply to this email directly or view it on GitHubhttps://github.com/rennat/pynliner/issues/22 .

jvanasco commented 11 years ago

If you're sending out a non-personalized marketing email, it's no big deal. You only need to do this once.

If you're sending out 1000+ of customized transactional emails, this can take quite a bit of CPU and time. on one of my boxes, it average 1.5s to parse and inline an email that is 'nonstandard' and .5s to parse a 'standard' one (via timeit, and only timing css_inlined = pynliner.fromString(html) ).

non-standard = hand drawn html standard = non-standard , pumped into BS, then printed out ( this way I ensure its exactly what BS likes )

having a .5 - 1.5s hit on every transactional or customized email is way too much of an overhead for our volume. Our current list size would take around an hour of processing time.

rennat commented 11 years ago

You make a good case! Supporting template languages would be a great new feature as well as speeding up the whole process. On Apr 29, 2013 2:40 PM, "jvanasco" notifications@github.com wrote:

If you're sending out a non-personalized marketing email, it's no big deal. You only need to do this once.

If you're sending out 1000+ of customized transactional emails, this can take quite a bit of CPU and time. on one of my boxes, it average 1.5s to parse and inline an email that is 'nonstandard' and .5s to parse a 'standard' one (via timeit, and only timing css_inlined = pynliner.fromString(html) ).

non-standard = hand drawn html standard = non-standard , pumped into BS, then printed out ( this way I ensure its exactly what BS likes )

having a .5 - 1.5s hit on every transactional or customized email is way too much of an overhead for our volume. Our current list size would take around an hour of processing time.

— Reply to this email directly or view it on GitHubhttps://github.com/rennat/pynliner/issues/22#issuecomment-17189198 .

tclancy commented 11 years ago

I think this is also causing a problem I'm seeing where it is converting some of the <> signs inside a conditional comment, which breaks the rendering of buttons for Outlook if you use the HTML provided by http://emailbtn.net/ turning them into things like this:

<!--[if mso]&gt;
  &lt;v:roundrect xmlns:v="urn:schemas-microsoft-com:vml" xmlns:w="urn:schemas-microsoft-com:office:word" href="http://localhost:8000/account/confirm_email/f10c6a42a92d1f16bb8d79d635f58fec56c4b7fb/" style="height:40px;v-text-anchor:middle;width:200px;" arcsize="8%" stroke="f" fillcolor="#da2929"&gt;
    &lt;w:anchorlock /&gt;
    &lt;center&gt;
  &lt;![endif]-->

Is this something BeautifulSoup is doing or is it the package? It looks like BS had problems with conditional comments in the past but there appear to be some workarounds. If you let me know where to look, I can take a crack at this as the package has proved very helpful.

rennat commented 11 years ago

I'll need to do some investigating on where this occurs in the process. It would be helpful if you commit some test cases that fail for this reason.

On Tue, Jun 18, 2013 at 10:49 AM, Tom Clancy notifications@github.comwrote:

I think this is also causing a problem I'm seeing where it is converting some of the <> signs inside a conditional comment, which breaks the rendering of buttons for Outlook if you use the HTML provided by http://emailbtn.net/ turning them into things like this:

Is this something BeautifulSoup is doing or is it the package? It looks like BS had problems with conditional comments in the pasthttps://groups.google.com/forum/?fromgroups#!topic/beautifulsoup/eMKhD3T2z20but there appear to be some workaroundshttp://stackoverflow.com/questions/132488/regex-to-remove-conditional-comments. If you let me know where to look, I can take a crack at this as the package has proved very helpful.

— Reply to this email directly or view it on GitHubhttps://github.com/rennat/pynliner/issues/22#issuecomment-19620504 .

tclancy commented 11 years ago

Does this work/ do you need additional tests? Not sure if it's too simplistic or overly complicated.

rennat commented 11 years ago

Thanks I'll take a look this evening.

On Tue, Jun 18, 2013 at 2:27 PM, Tom Clancy notifications@github.comwrote:

Does this work/ do you need additional tests? Not sure if it's too simplistic or overly complicated.

— Reply to this email directly or view it on GitHubhttps://github.com/rennat/pynliner/issues/22#issuecomment-19634816 .

rennat commented 11 years ago

@jvanasco I'm still planning a complete rewrite with backwards compatibility on the same interface to be faster and more flexible than relying on beautiful soup but until the rewrite it has to go through beautiful soup which aggressively converts to HTML.

rennat commented 9 years ago

TL;DR Work on the rewrite is continuing as I have time. My primary focus is making the conversion process more efficient. My secondary focus is supporting templating languages in an extensible way.


I've recently been working on the rewrite and while trying for a flexible conversion process I've realized that while pynliner 1.0 can (and will) be built to support templating languages and made extensible to allow other templating languages/uses, complete CSS support cannot happen before a template is rendered.

A simple example showing this if some_x = [0,1,2]:

Template

<div>
{{ for x in some_x }}
    <span class="an-x">{{x}}</span>
{{ endfor }}
</div>

Stylesheet

.an-x { color: red; }
.an-x:last-child { color: blue; }

Output if template rendered first (Correct)

<div>
    <span class="an-x" style="color:red">0</span>
    <span class="an-x" style="color:red">1</span>
    <span class="an-x" style="color:blue">2</span>
</div>

Output if applied to template first (Incorrect)

<div>
    <span class="an-x" style="color:blue">0</span>
    <span class="an-x" style="color:blue">1</span>
    <span class="an-x" style="color:blue">2</span>
</div>

The problem here is the :last-child selector which, among other selectors, cannot be tested for correctly until the template is rendered. This does not mean we cannot support processing templates. It does reduce functionality to a subset of CSS.

None of this prevents us from enabling conversion on templates but I will be focusing my effort on making the conversion process more efficient first.