pqzx / html2docx

Convert html to docx
MIT License
70 stars 49 forks source link

failed to convert html table to docx #1

Closed demiwang1214 closed 3 years ago

demiwang1214 commented 3 years ago

convert html table to docx, but got error as below. html table is from tinymce

html='

rgerhhyww
wy4y45e4y
wy4y45e4y
wy4y45e4y

' new_parser = HtmlToDocx() new_parser.add_html_to_document(html, docx) error: File "/Users/xxxx/xxxx/venv/lib/python3.6/site-packages/htmldocx/init.py", line 215, in handle_table rows, cols = self.get_table_dimensions(table_soup) File "/Users/xxxxx/xxxxx/venv/lib/python3.6/site-packages/htmldocx/init.py", line 369, in get_table_dimensions cols = rows[0].find_all(['th', 'td'], recursive=False) IndexError: list index out of range def get_table_dimensions(self, table_soup): rows = table_soup.find_all('tr', recursive=False). # error : get 'tr', but return rows is [] cols = rows[0].find_all(['th', 'td'], recursive=False) return len(rows), len(cols)

can anyone help me? thanks

pqzx commented 3 years ago

Currently tables with <tbody> aren't supported. For now I'd suggest using a workaround of removing the tbody tags from the html before feeding it to add_html_to_document(). E.g.

html = re.sub('</*tbody>', '', html)
new_parser.add_html_to_document(html, docx)

(The way this whole package is managed is pretty terrible. I'm looking at cleaning that up first before adding a fix).

demiwang1214 commented 3 years ago

@pqzx thank you for response. i added your workaround, no exception happen now, but the table inline styles didn't display properly, such as table border and cell width didn't display, can it be improved? do you have any suggestion?

image

pqzx commented 3 years ago

This isn't currently supported. Could be worth adding.

https://python-docx.readthedocs.io/en/latest/api/table.html#docx.table.Table https://python-docx.readthedocs.io/en/latest/api/table.html#docx.table._Column

amthorn commented 3 years ago

I got it working for me. Opened a PR here: https://github.com/pqzx/html2docx/pull/7