pqzx / html2docx

Convert html to docx
MIT License
70 stars 50 forks source link

'HtmlToDocx' object has no attribute 'run' #37

Open Amritpal2001 opened 2 years ago

Amritpal2001 commented 2 years ago

Hey, I am getting this issue sometimes while converting from HTML to docs. Screenshot 2022-03-30 at 10 30 55 PM

Thanks!

mlatysh commented 2 years ago

Same here

sanaullahaq commented 1 year ago

same here, any solution?

sanaullahaq commented 1 year ago

I could be wrong, but what I have found is when I try to convert HTML(table with empty/blank cell) to Docx when the error occurs.

dashingdove commented 1 year ago

I can confirm this error occurs if you put a <br> tag right at the start of a <td>. If you put anything before the <br> then it seems to work fine. For example:

<td><br>Hello world</td> throws an error <td>Hello world<br></td> does not

I would guess that the run needs to be initialised somewhere. If some content precedes the <br> then the run has already been created by the time the <br> is parsed, but when the <br> is the first child of the <td> then the run attribute is missing which causes the error.

dashingdove commented 1 year ago

The error also occurs when adding a <br> to the start of a document.

document = docx.Document()
html_parser = htmldocx.HtmlToDocx()
html_parser.add_html_to_document('<br>', document) #AttributeError

Basically, if the first thing that the parser sees is a <br> then it throws an error. In the table cell example, a child parser gets created to parse the contents of the cell so it's exactly the same issue.

ptkinvent commented 6 months ago

+1 on this issue

steps to replicate:

from docx import Document
from htmldocx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html = '<table><tr><td><br>testing</td></tr></table>'
new_parser.add_html_to_document(html, document)
pierreavn commented 3 months ago

+1