Moggie's HTML-to-text conversion in moggie.security.html.HTMLToTextCleaner is used for generating most plain-text e-mail views, and is preferred by default over the text parts including in the e-mails themselves simply because so many systems generate broken or incomplete text parts these days.
The class does a decent job generating readable text from HTML input, including links, images and tags such as pre, blockquote and ul - but the structure implied by a table is currently ignored.
This means we are losing some important information from e-mails sent by financial institutions, travel itineraries, and probably some others. (This isn't just layout for marketing messages!)
So we should support tables!
Currently the code inherits from HTMLCleaner a depth-first algorithm which converts each tag into text in the rerender_tag method. This needs to change - the depth-first code will need to buffer the table contents (including tables within tables) and postpone processing them until the size/structure of the table is known, allowing us to allocate a width to each table column, and then render cells side-by-side in the plain text.
Moggie's HTML-to-text conversion in
moggie.security.html.HTMLToTextCleaner
is used for generating most plain-text e-mail views, and is preferred by default over the text parts including in the e-mails themselves simply because so many systems generate broken or incomplete text parts these days.The class does a decent job generating readable text from HTML input, including links, images and tags such as pre, blockquote and ul - but the structure implied by a table is currently ignored.
This means we are losing some important information from e-mails sent by financial institutions, travel itineraries, and probably some others. (This isn't just layout for marketing messages!)
So we should support tables!
Currently the code inherits from
HTMLCleaner
a depth-first algorithm which converts each tag into text in thererender_tag
method. This needs to change - the depth-first code will need to buffer the table contents (including tables within tables) and postpone processing them until the size/structure of the table is known, allowing us to allocate a width to each table column, and then render cells side-by-side in the plain text.