Looks like add_styles_to_paragraph it's not handling styles expressed as percentages because the regex is looking for chars like px.
Sample HTML:
<P STYLE="margin-top:6px;margin-bottom:0px; margin-left:4%; text-indent:4%"><FONT STYLE="font-family:Times New Roman" SIZE="2">test text</FONT></P>
Result:
ValueError: could not convert string to float: '4%'
Full traceback:
File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:609, in HtmlToDocx.parse_html_file(self, filename_html, filename_docx)
607 html = infile.read()
608 self.set_initial_attrs()
--> 609 self.run_process(html)
610 if not filename_docx:
611 path, filename = os.path.split(filename_html)
File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:583, in HtmlToDocx.run_process(self, html)
581 if self.include_tables:
582 self.get_tables()
--> 583 self.feed(html)
File ~/opt/anaconda3/lib/python3.9/html/parser.py:110, in HTMLParser.feed(self, data)
104 r"""Feed data to the parser.
105
106 Call this as often as you want, with as little or as much text
107 as you want (may include '\n').
108 """
109 self.rawdata = self.rawdata + data
--> 110 self.goahead(0)
File ~/opt/anaconda3/lib/python3.9/html/parser.py:170, in HTMLParser.goahead(self, end)
168 if startswith('<', i):
169 if starttagopen.match(rawdata, i): # < + letter
--> 170 k = self.parse_starttag(i)
171 elif startswith("</", i):
172 k = self.parse_endtag(i)
File ~/opt/anaconda3/lib/python3.9/html/parser.py:344, in HTMLParser.parse_starttag(self, i)
342 self.handle_startendtag(tag, attrs)
343 else:
--> 344 self.handle_starttag(tag, attrs)
345 if tag in self.CDATA_CONTENT_ELEMENTS:
346 self.set_cdata_mode(tag)
File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:465, in HtmlToDocx.handle_starttag(self, tag, attrs)
463 if 'style' in current_attrs and self.paragraph:
464 style = self.parse_dict_string(current_attrs['style'])
--> 465 self.add_styles_to_paragraph(style)
File ~/home/.venv/lib/python3.9/site-packages/htmldocx/h2d.py:218, in HtmlToDocx.add_styles_to_paragraph(self, style)
216 margin = style['margin-left']
217 units = re.sub(r'[0-9]+', '', margin)
--> 218 margin = int(float(re.sub(r'[a-z]+', '', margin)))
219 if units == 'px':
220 self.paragraph.paragraph_format.left_indent = Inches(min(margin // 10 * INDENT, MAX_INDENT))
Looks like
add_styles_to_paragraph
it's not handling styles expressed as percentages because the regex is looking for chars likepx
.Sample HTML:
Result:
Full traceback: