incuna / django-wkhtmltopdf

Django Wrapper to the PDF Renderer: wkhtmltopdf
BSD 2-Clause "Simplified" License
325 stars 155 forks source link

Calling a wkhtmltopdf microservice instead of a subprocess? #178

Open mrenoch opened 3 years ago

mrenoch commented 3 years ago

Hi,

Thanks for all your hard work on this django plugin! We use it heavily and it always delivers.

I am wondering if you have you ever considered the possibility of calling a microservice instead of shelling out to a subprocess on the webserver? I recently encountered https://imti.co/webpage-to-pdf-microservice/, which is an open source packaged version of wkhtmltopdf. We want to remove all of the pdf rendering dependencies from the webserver, and this approach looked viable.

Has anyone every attempted modifying django-wkhtmltopdf to use a microservice instead of a local binary? Would you take a PR if we attempted that?

PS - we're also interested in exploring caching strategies also.

best

mrenoch commented 3 years ago

Hi @maxpeterson - Do you know if thiis project still active? Are you accepting PRs or planning new releases? Thanks!

maxpeterson commented 3 years ago

Hi @mrenoch I am not actively maintaining this. I am no longer using it for any projects so I am afraid it doesn’t get much attention.

As to your original questions, as far as I know no one has attempted to use a microservice rather than calling the binary.

It sounds an interesting idea, but without proper investigation I am not sure whether this library is a good starting point or not.

If you can leverage the existing “wrapping code” and provide an option to call a micro service in place of the binary, without breaking compatibility with the binary then there could be value in using this library.

mrenoch commented 3 years ago

Thanks @maxpeterson!

Do you know if this project is being actively maintained? There hasn't been much activity and I wonder about it's future. Seems like there are many forks, and maybe it "just works", but I am curious about it's future.

Apart from the microservice idea, it seems like this library may also be a good place to introduce a caching layer - am trying to judge if that if work is also interesting to anyone here.

maxpeterson commented 3 years ago

This project is not actively maintained. Most of the development was done almost a decade ago by the team at Incuna, since then it has received steady support and maintenance from the wider community.

There is no longer an Incuna team to maintain it and my time is limited.

I am not sure how actively it is used, but https://pepy.tech/project/django-wkhtmltopdf suggest it is still fairly widely used so it would be a shame not find a way to keep it going.

If you are willing to help maintain it then I would be grateful of the help. Likewise, if you wan't to attempt to add your microservice and caching ideas then I would be happy help get them merged and released.

I may be a bit delayed in responding, but always feel free to @ me.

maxpeterson commented 3 years ago

I should also mention that @johnraz has helped a lot with maintenance

pinoatrome commented 3 years ago

Hi all, I am leading a team that use this library on a number of projects with great satisfaction.

I've just completed a remote execution of the wkhtmltopdf in a remote (to django) container via gRPC. The initial driver for this implementation is to reduce the size of the django image by moving all the binaries related to pdf creation to a different container (exposed as microservice).

this is the (simple) idea: the PDFTemplateView uses a custom version of the PDFTemplateResponse that invokes the endpoint in case it is defined in django settings (remote invocation), otherwise goes with the usual path (local invocation).

class SPDFTemplateView(PDFTemplateView):
    response_class = SPDFTemplateResponse

class SPDFTemplateResponse(PDFTemplateResponse):

    @property
    def rendered_content(self):
        endpoint = getattr(settings, 'WKHTMLTOPDF_ENDPOINT', None)
        if not endpoint:
            logger.debug(f'grpc service endpoint missing in settings: will create PDF with local wkhtmltopdf binary')
            return super().rendered_content
        logger.debug(f'rendering content via grpc service @ {endpoint}')
        ....

Then after the templates rendering (in RenderedFile) the files content are passed to a client that invokes the remote service and obtains the bytes of the PDF file.

        ...
        input_file = RenderedFile(
            template=input_template,
            context=context,
            request=request
        )
        ...
        output = self.cmd_options.pop('output', None)
        try:
            content = client.transform(endpoint, input_file.filename, cmd_options=cmd_options,
                                       header=header_filename, footer=footer_filename, cover=cover_filename)
            if output:
                with open(output, 'wb') as pdf_fp:
                    pdf_fp.write(content)
            return bytes(content)
        except exceptions.SDPFRenderClientException as e:
            logger.error(f'error from PDF endpoint: {e}')
            raise

There is a problem with RenderedFile -> the rendered template contains local references to static and media files: for instance the href="/static/css/project.css" is transformed to href="file:///home/user/project/root_static_dir/css/project.css"

This is done when render_to_temporary_file() call make_absolute_paths()

and it is not likely (or too limited) that the two containers for django and the microservice share the same absolute path: guess it is a matter of passing a flag to RenderedFile to call make_absolute_paths (local invocation) or not (remote invocation) in render_to_temporary_file:

    content = smart_text(content)
    content = make_absolute_paths(content) // <-- this should happen on the remote server creating the PDF, not on the client side when rendering the templates.

Caching common common parts (header, footer, cover) could be straightforward: once cached the service could download its content when the cache keys are passed instead of the actual binary content.

Sorry for the long post.

Feel free to contact me for any follow up. Ciao

mrenoch commented 3 years ago

Hi @pinoatrome!

Thanks for reaching out. I am leading a team that also relies extensively on wk for our main product line. I would love to talk more sometime and learn about your roadmap. Perhaps we can team up and revive this project, under a proper project account.

cheers! /Jonah

pinoatrome commented 3 years ago

Hi Jonah, thanks for your interest. Your proposal sounds great: fell free to contact me, my personal email should be visible on github.

About the microservice implementation it is passing tonight in beta test after solved the absolute path issue: in case of remote call the content of the rendered template (plus header footer and cover) is not saved in any temporary file but streamed to the service -> no need for any change in RenderedFile (simply not using it).

Ciao Giuseppe