jarontai / html2md

A library for converting HTML to Markdown in Dart. It supports CommonMark, simple table and custom converting rules. 将html转换为markdown的Dart库,支持CommonMark、简单表格以及自定义转换规则。
https://pub.dev/packages/html2md
BSD 2-Clause "Simplified" License
57 stars 25 forks source link

Table support for loose html #22

Closed scribetw closed 2 years ago

scribetw commented 2 years ago

The html was generated by Outlook.

The table markdown is invalid if <th> is not added. Also the <br> should be ignored in cells.

final html = '''
<table style="border-collapse:collapse; box-sizing:border-box; height:64px" cellspacing="0" cellpadding="1">
  <tbody>
    <tr>
      <td scope="" style="width:123px; border-color:rgb(171,171,171); border-style:solid; border-width:1px; background-color:transparent; box-sizing:border-box; height:22px">Name<br aria-hidden="true"></td>
      <td scope="" style="width:200px; border-color:rgb(171,171,171); border-style:solid; border-width:1px; background-color:transparent; box-sizing:border-box; height:22px; word-break:break-word; white-space:normal">Value<br aria-hidden="true"></td>
    </tr>
    <tr>
      <td scope="" style="width:123px; border-color:rgb(171,171,171); border-style:solid; border-width:1px; background-color:transparent; box-sizing:border-box; height:41px">Hello,<br aria-hidden="true"></td>
      <td style="width:200px; border-color:rgb(171,171,171); border-style:solid; border-width:1px; background-color:transparent; box-sizing:border-box; height:41px; word-break:break-word; white-space:normal">world from MS 365!<br aria-hidden="true"></td>
    </tr>
  </tbody>
</table>
''';
print(html2md.convert(html));

Actual

| Name  
 | Value  
 |
| Hello,  
 | world from MS 365!  
 |

Expected

| Name    | Value    |
| --- | --- |
| Hello,    | world from MS 365!    |

Compared with Turndown (+ GFM Plugin) https://guyplusplus.github.io/turndown-plugin-gfm/

jarontai commented 2 years ago

Hi scribetw, thanks for reporting. Currently, html2md doesn't provide such configration for table converting. Maybe you should consider using a custom rule: https://github.com/jarontai/html2md#custom-rules

scribetw commented 2 years ago

Yes, I'm currently using a set of custom table rules based on the Turndown GFM plugin source code.

I'll do more testing and make a PR in the future.

scribetw commented 2 years ago

PR in #23