mwilliamson / python-mammoth

Convert Word documents (.docx files) to HTML
BSD 2-Clause "Simplified" License
811 stars 121 forks source link

html.strip_empty breaks table rowspan #71

Closed cockcrow closed 5 years ago

cockcrow commented 5 years ago

before

<table><tr><td colspan="8" rowspan="2"><p><strong>中国人民银行天津分行行政处罚信息公示表</strong></p></td></tr><tr></tr><tr><td><p><strong>序号</strong></p></td><td><p><strong>企业名称</strong></p></td><td><p><strong>行政处罚决定书文号</strong></p></td><td><p><strong>违法行为类型</strong></p></td><td><p><strong>行政处罚内容</strong></p></td><td><p><strong>作出行政处罚决定机关名称</strong></p></td><td><p><strong>作出行政处罚决定日期</strong></p></td><td><p><strong>备注</strong></p></td></tr></table>

after

<table><tr><td colspan="8" rowspan="2"><p><strong>中国人民银行天津分行行政处罚信息公示表</strong></p></td></tr><tr><td><p><strong>序号</strong></p></td><td><p><strong>企业名称</strong></p></td><td><p><strong>行政处罚决定书文号</strong></p></td><td><p><strong>违法行为类型</strong></p></td><td><p><strong>行政处罚内容</strong></p></td><td><p><strong>作出行政处罚决定机关名称</strong></p></td><td><p><strong>作出行政处罚决定日期</strong></p></td><td><p><strong>备注</strong></p></td></tr></table>

I'm not sure about why strip empty nodes. But it makes the table layout incorrect.

cockcrow commented 5 years ago

Is it okay to just add table related tags tr/th/td into _VOID_TAG_NAMES?

The is_void method has also check children field.

It solved the problem mentioned above.

cockcrow commented 5 years ago

Is it okay to just add table related tags tr/th/td into _VOID_TAG_NAMES?

The is_void method has also check children field.

It solved the problem mentioned above.

It is better to reserve tags in StripEmpty.

mwilliamson commented 5 years ago

Thanks for the report, this should be fixed now.