pqzx / html2docx

Convert html to docx
MIT License
69 stars 49 forks source link

"IndexError: list index out of range" when using colspan attribute #51

Open sanjass opened 1 year ago

sanjass commented 1 year ago

A minimal example to reproduce:

from docx import Document
from htmldocx import HtmlToDocx

parser = HtmlToDocx()
colspan = """
<table>
  <tr>
    <th colspan="2">Monthly Savings</th>
  </tr>
  <tr>
    <td>January</td>
    <td>$100</td>
  </tr>
  <tr>
    <td>February</td>
    <td>$80</td>
  </tr>
</table>
"""

document = Document()
parser.add_html_to_document(colspan, document)
document.save("colspan.docx")
print("Document created")

Expected: a docs with a table that looks like this

image

Actual: File ".../lib/python3.10/site-packages/docx/table.py", line 89, in cell return self._cells[cell_idx] IndexError: list index out of range

Removing

 <tr>
    <th colspan="2">Monthly Savings</th>
  </tr>

"resolves" the error, indicating colspan is causing this issue.

It seems there are others facing similar issues with rowspan and colspan are some PRs trying to address this issue. Can the package be updated to resolve this issue? thanks in advance