loftylabs / django-hardcopy

Render PDFs from HTML in Python/Django using Headless Chrome
http://hirelofty.com
128 stars 11 forks source link
django pdf-generation python python3

django-hardcopy: Render PDFs and PNGs in Django with headless Chrome

Chrome introduced headless mode in v59 opening the possibility of using Chrome as a fast and elegant way of generating PDF data or PNG screenshots programatically via HTML. django-hardcopy is an alternative to other projects which leverage wkhtmltopdf, a great tool but one that lacks the portability, ease of installation, the performance, and reliability of Chrome.

Requirements

Installation

Install the library:

pip install django-hardcopy

Install Chrome or a derivative:

apt-get install chromium-browser

Set your Chrome path (optional):

# settings.py

CHROME_PATH = '/path/to/chrome-or-chromium'

This can be useful if you want to use chrome-canary or chromium-browser (available by default in Ubuntu). Django-hardcopy will attempt to smartly default the appropriate chrome path for your os. If you're on Mac OSX, just upgrade to the latest Chrome and you're good to go!

Set the rendering window size (optional, default: 1280,720):

# settings.py

CHROME_WINDOW_SIZE = '800,600'

Usage

The easiest way to use django-hardcopy is to use its CBV mixin:

from django.views.generic import TemplateView
from hardcopy.views import PDFViewMixin, PNGViewMixin

class MyPDFView(PDFViewMixin, TemplateView):
    template_name = "pdf_me.html"

class MyPNGView(PNGViewMixin, TemplateView):
    template_name = "png_me.html"
    height = '1080'
    width = '1920'

It works with any Django Class Based View, and implements PDF or PNG rendering on the GET HTTP method. Further, if the ?html querystring variable is provided the mixin will render the view normally for designing and debugging of the raw HTML. The CBV mixin supports several options for extension and customization covered in the FAQ section.

There are two methods which implement a lower level API which can be used directly for PDFs:

file_to_pdf(input_file, output_file, **extra_args)

Arguments:

This function will read the contents of input_file (an HTML bytestring), render it with Chrome and store the binary PDF data in output_file. Any kwargs are translated as commandline arguments to chrome when starting the headless browser for rendering, i.e.:

from hardcopy import file_to_pdf

extra_args = {
    'virtual-time-budget': 6000
}

file_to_pdf(open('myfile.html'), open('myfile.pdf'), **extra_args)
# translates to --virtual-time-budget=6000 when starting chrome

extra_args = {
    'disable-gpu': None
}

file_to_pdf(open('myfile.html'), open('myfile.pdf'), **extra_args)
# translates to --disable-gpu when starting chrome (currently on by default and required by Chrome)

bytestring_to_pdf(html_data, output_file, **extra_args)

Arguments:

This render the contents of html_data with Chrome and store the binary PDF data in output_file. Any kwargs are translated as commandline arguments to chrome when starting the headless browser for rendering, i.e.:

from hardcopy import bytestring_to_pdf

extra_args = {
    'virtual-time-budget': 6000
}

bytestring_to_pdf(b"<html><h1>Hello Chrome!</h1></html>", open('myfile.pdf'), **extra_args)
# translates to --virtual-time-budget=6000 when starting chrome

extra_args = {
    'disable-gpu': None
}

bytestring_to_pdf(b"<html><h1>Hello Chrome!</h1></html>", open('myfile.pdf'), **extra_args)
# translates to --disable-gpu when starting chrome (currently on by default and required by Chrome)

Similar functions are available for PNG generation:

file_to_png(input_file, output_file, width, height, **extra_args)

Arguments:

bytestring_to_png(html_data, output_file, width, height, **extra_args)

Arguments:

FAQ

In local development however, Chrome will recieve a connection refused error on attempts to load static files in templates included like <link href="https://github.com/loftylabs/django-hardcopy/blob/master/{% static 'style.css' %}" rel="stylesheet">. The best workaround for this is to include static assets inline in PDF/PNG templates.

A nice feature for the roadmap of django-hardcopy would be dynamic parsing of templates to convert linked static assets to inline assets automatically. (PRs welcome :))