pqzx / html2docx

Convert html to docx
MIT License
69 stars 49 forks source link

parser crash with a tag without href #62

Open victorforerocabarcas opened 11 months ago

victorforerocabarcas commented 11 months ago

htmldocx 0.0.6 An a tag without href causes the following error:

File "/home/victor/Proyectos/html2docx/./prueba.py", line 36, in new_parser.add_html_to_document(txt2, document) File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 591, in add_html_to_document self.run_process(html) File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 583, in run_process self.feed(html) File "/usr/lib/python3.10/html/parser.py", line 110, in feed self.goahead(0) File "/usr/lib/python3.10/html/parser.py", line 162, in goahead self.handle_data(unescape(rawdata[i:j])) File "/home/victor/Proyectos/html2docx/venv/lib/python3.10/site-packages/htmldocx/h2d.py", line 514, in handle_data self.handle_link(link['href'], data) KeyError: 'href''

My suggestion is: add if 'href' in link: in line 514

the patch could be: line 512: link = self.tags.get('a') if link: if 'href' in link: self.handle_link(link['href'], data) else: