jorisschellekens / borb

borb is a library for reading, creating and manipulating PDF files in python.
https://borbpdf.com/
Other
3.37k stars 148 forks source link

BUG - Issues converting HTML to PDF #188

Closed benninkcorien closed 9 months ago

benninkcorien commented 9 months ago

Describe the bug

I'm trying to convert a local HTML file (of a yearly planner, messy work in progress: python generated html files with css for styling and layout ) to PDF.

I'm getting all kinds of errors SVG unsuported. If I delete all SVGs I think there's a problem with the Is this something that should work with borb? I haven't found a good solution/package yet that can convert HTML to CSS with all the layout/styling in place.

To Reproduce Steps to reproduce the behaviour:

save the attached .txt as .html in a "testfiles" folder. Create a "borb" folder on the same level. run python:

import os
import glob

from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit.export.html_to_pdf.html_to_pdf import HTMLToPDF

html_dir = "testfiles"
pdf_dir = "borb"

os.makedirs(pdf_dir, exist_ok=True)

for html_file in glob.glob(os.path.join(html_dir, "*.html")):
    base_name = os.path.basename(html_file).split(".")[0]
    pdf_file = os.path.join(pdf_dir, f"{base_name}.pdf")
    print(f"Working on {base_name}")

    html_str: str = ""
    with open(html_file, "r", encoding="utf-8") as md_file_handle:
        html_str = md_file_handle.read()

    doc: Document = HTMLToPDF.convert_html_to_pdf(html_str)
    assert doc is not None

    with open(pdf_file, "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

Expected behaviour A PDF file from the HTML file, preferably one that takes all the CSS into account as well (and maybe SVG support, though I can easily replace those with JPG/PNG or something)

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

planner_full_result.txt

jorisschellekens commented 9 months ago

Hi,

The class HTMLToPDF simply isn't capable of handling complex HTML. It also doesn't handle CSS.

In your usecase I would opt for removing the man in the middle. Simply generate the daily planner page from scratch in borb.

Kind regards, Joris Schellekens