mailgun / talon

Apache License 2.0
1.27k stars 285 forks source link

html to lined text issue #198

Closed chen-xiao-dong closed 5 years ago

chen-xiao-dong commented 5 years ago

Consider this html snippnet:

Hi Daniel,

when run the utils.html_to_text, the first node is tag span (which is not in _BLOCKTAGS + _HARDBREAKS) and no line break added.

We can change the test case as below:

 html = """<div>
<!-- COMMENT 1 -->
<span>TEXT 1</span>
<p><span>TEXT 2 </span><!-- COMMENT 2 --></p>
</div>"""

then the test case for utils_test.py +115 will fail