iulica / docx-mailmerge

Mail merge for Office Open XML (docx) files without the need for Microsoft Office Word.
MIT License
55 stars 7 forks source link

Possibility to add dynamic Images. #11

Closed KapilPatel-KP closed 1 year ago

KapilPatel-KP commented 1 year ago

Expected Behaviour

Possibility too add dynamic Images based on URL or base64 data

Current Behaviour

Current it is not possible to add image(Dynamic Images based on URL or base64 data)

Context

We are using mailmerge with sample docx file to create report with multiple variables which absolutely works well for other datatypes but we could not find a way to populate images from server url/base64data.

PS: It works well with rendering local images (That too requires every time refreshing the images manually (or F9 key))

Image Local Path Selection_009

Images Not Populating. image (1)

Your Environment

iulica commented 1 year ago

Answer to PS: For automatically updating fields on open you can check the documentation and set the auto_update_fields_on_open parameter to "auto". This way you don't need to update the fields manually.

As for the main issue, does MS Word know how to read images from base64 or URL ? If yes, then it's a BUG and I will look into it.

If no, then you're out of luck as the only way would be for the docx-mailmerge to download/save the image to a temporary file and use that as the value for the field. And this can also be done before calling the mailmerge method. It would be out of the scope of the library to include this feature.

Yenthe666 commented 1 year ago

@iulica hey! 👋 Thanks a lot for your very fast feedback, I really appreciate this! We will take a look at the auto_update_fields_on_open parameter, I think we might have missed this option.

The issue that we have with images is that we store them in a postgreSQL database or in a local file path from which we want to insert the image in the Word document with placeholders. A sample is where we create a quotation document in which we will loop over lines, add them and then want to show an image of the product within the document.

I took a bit of a dive into the internet before and figured out that the library docx in some way allows it by using the add_picture function. See https://stackoverflow.com/a/32932578/2262409 What we really would love to have is an option to use Merge Fields, for example with a placeholder product_image in which we then wrap the dictionary of values so it inserts the image into the document. However we don't seem to find any way to combine something such as add_picture with merge_templates and to get the images injected (into the right place). Neither could we find a native way in this/your library.

I've learned myself, as I maintain open source work, that generally you do this type of work for free and it is cumbersome. I personally think that is not fair / a great way to work as it adds value for us alone.. so, if you could look into if this can be supported and how I'm happy to compensate you in a way for an effort so it is a win-win-win (we have a solution, you get paid for your spent time and the community gets (documented) support for a new feature). You can always directly contact me if you'd rather not discuss this in the open on yenthe [AT] mainframemonkey -dot- com

I'm looking forward to your feedback :)

iulica commented 1 year ago

The issue that we have with images is that we store them in a postgreSQL database or in a local file path from which we want to insert the image in the Word document with placeholders.

Is it a postgreSQL or a local file ? For the local files it should work fine, for URL it should also work fine, I just checked, I only have to test it through.

Later: I checked and the local and URL work just fine.

You can check the template and the generated docx.

with MailMerge(path.join(path.dirname(__file__), 'test_includepicture.docx'), auto_update_fields_on_open="auto") as document:
            self.assertEqual(document.get_merge_fields(),
                             set(['rowno', 'url']))

            document.merge_templates([
                dict(rowno="1", url="test_includepicture_local.png"),
                dict(rowno="2", url="https://www.pngall.com/wp-content/uploads/8/Sample-Watermark-PNG-Image.png"),
                dict(rowno="3", url="")

            ],
            "nextPage_section"
            )
            document.write("tests/output/test_includepicture_1.docx")

So, for the base64 case, you will have to do some simple preparation before calling merge_templates:

import base64
..
# you have the data from the PGSQL database, and let's say the field image is the base64
mergedata_rows = []
for row in cursor.result():
    image_data = row['image']
    image_name = row['image_name']  # or any unique id of the image
    image_filename = "temp_image_{}.png".format(image_name)
    with open(image_filename, "wb") as fh:
        fh.write(base64.decodebytes(img_data))
    row['image_filename'] = image_filename
    mergedata_rows.append(row)

document.merge_templates(mergedata_rows, ...)

This should then work fine.

test_includepicture.docx

test_includepicture_1.docx test_includepicture_local

iulica commented 1 year ago

did the previous comment help ?

Yenthe666 commented 1 year ago

@iulica thanks for your feedback and the insight! We did some testing/development today and no matter what we try the images do not show up automatically. If we add images from URL/path and generate a Word it opens with a warning: image

If we then click yes we still do not see the image: image

When we then click on "Enable editing" at the top we get (another) pop-up: image

When we then click on "Yes" we finally get the images through: image

Our code that contains the auto_update_fields_on_open="auto"

    with MailMerge(word_template, auto_update_fields_on_open="auto") as document:
        template_fields = document.get_merge_fields()

Do we miss something / do something wrong? Furthermore on LibreOffice it does not ever seem to work 🤔

iulica commented 1 year ago

Unfortunately that is Microsoft's way of handling security in Word Documents and I don't think there is any way around it. You have to manually confirm you want to get your images in your document. But it works and once you save the document, it will be fine and won't ask you anymore afterwards.

So, you can consider the process as a 2-step one:

  1. use Python docx-mailmerge to add data and image URLs to the word template and create a "filled" template.
  2. Open the template with Word to complete the process and get the images from the URL into the final word document. Save the word document. (either as docx or as pdf)

LibreOffice has limited support for this so you shouldn't even try. Mailmerge fields is a Microsoft-only feature.

Yenthe666 commented 1 year ago

@iulica okay we'll take the LibreOffice as a known limit for now then. As for the warnings/how Word works.. so it is completely expected behaviour to get 2 pop-ups on which we have to accept? Is there also a way to trigger this through code instead of 'bothering' the end user with it? We basically have two flows:

  1. We pre-fill a word document its variables with data from our ERP and then directly convert the Word to PDF in memory. Because we do this the macro's do not go off and we don't get the images on the PDF
  2. We pre-fill a word document its variables with data from our ERP and then export the Word so it is downloaded right away for the end user. We now have to 'bother' the user however with some pop-ups & no automatically shown images.
iulica commented 1 year ago

As for the warnings/how Word works.. so it is completely expected behaviour to get 2 pop-ups on which we have to accept?

I have tried in the past to look for a way for Word to update the fields through code. DIdn't find any way except this flag, that Word should update the fields on Open. But that triggers the pop-ups. I haven't used Images but other fields like { IF ... } Those must also be handled by Word and not docx-mailmerge as they are far too complicated. Embedding Images is also very complicated and not worth "reimplementing". That's why the { INCLUDEPICTURE ... } with all its inconveniences is the only way to go.

  1. We pre-fill a word document its variables with data from our ERP and then directly convert the Word to PDF in memory. Because we do this the macro's do not go off and we don't get the images on the PDF

I get it now, you are doing it automatically on a website I presume. The second flow is not very appealing as the users will not want to go through the pop-ups.

As for the first, how do you transform the docx into pdf ?

I wrote another package, named doc-workflow that has a plugin for transforming a docx into pdf using the package docx2pdf. It works on Mac/Windows and it needs Word installed. On Mac it asks for permission when it opens the Word, but on Windows I noticed it doesn't ask anything. It just takes a few seconds and then writes the PDF. I have however not tested it with images and I have no windows machine to test it on.

But, if you want an automated solution that works headless and without Microsoft Windows, I would steer away from the docx and docx-mailmerge as for these advanced uses it looks more and more like a hack. I would move to another template system for generating documents, like for example html and html2pdf. That would be future proof and not require proprietary tools.

iulica commented 1 year ago

Another solution, beautiful and very powerful would be to use LaTeX templates and then pdflatex to generate pdf from LaTeX. Since Latex files are just text files you can use various template systems to update the data and urls. If the image is a URL one might need to download it locally. But since you're using python, it's a trivial task. Some resources:

http://eosrei.net/articles/2015/11/latex-templates-python-and-jinja2-generate-pdfs ! HTTP website - will give you an warning.

WeasyPrint HTML -> PDF

The next one might be a good one. For a solution to the image problem, also check the comments. https://pbpython.com/pdf-reports.html

I hope this helps. It may require more upfront investment, as in a new solution but on the longterm is worth it. Better than just trying to figure out how to insert Images into the proprietary (even if they call it open document format) Microsoft documents.

Yenthe666 commented 1 year ago

I get it now, you are doing it automatically on a website I presume. The second flow is not very appealing as the users will not want to go through the pop-ups. We actually use it to print reports like picking list, order PDF's, invoices etc throughout the whole ERP :)

As for the first, how do you transform the docx into pdf ? We use LibreOffice headless with --convert-to pdf as we run our server on Linux

The reason we can't really use things like LaTeX or WeasyPrint is because we don't want to start from HTML actually. We want to give functional people an easy way to 'draw'/build reports in something like Word instead of needing a developer or somebody with a lot of experience. This is how I came at this package and idea but I have a few limitations currently. Perhaps I should consider broadening the scope and also adding a full HTML editor so people can 'design' the document with an editor itself and then convert HTML to PDF..🤔

A very big thank you for all the feedbacks, insights and sharing knowledge :)

iulica commented 1 year ago

I looked a bit on the python-docx and the add_picture implementation, and will be thinking on a way to replace the { INCLUDEPICTURE } fields with this. I will give it a thought during this Easter vacation and let you know. Still, I think that an HTML editor with {{field_name}} style templating would be better in the long run.

iulica commented 1 year ago

python-docx seems unmaintained, there is a fork that looks ok, python-docx-ng. I'll be looking there for the implementation.

Yenthe666 commented 1 year ago

Thanks a lot @iulica! 🙏 If you need some funding for adding a solution in this package then do let me know, maybe we can help that way. Enjoy the holidays.

iulica commented 1 year ago

adding a solution in this package

That seems more and more unlikely. It doesn't fit this package at all, especially as it is implemented like now. For the future, I might think of refactoring this package to make use of the python-docx-ng for the underlying docx manipulation, and that way this could become more like a plugin for the python-docx-ng for mailmerge support.

What I would however consider, is add support for this as a plugin in my other package doc-workflow as another step in a multi-step workflow following mailmerge. As one first needs to fill in the locations and then add the pictures instead of the INCLUDEPICTURE fields.

Also, I only now noticed your asnwer about docx-to-pdf using libreoffice and I will add that option as well. For the moment, the makepdf from docx uses Ms Word and works on windows and mac. It has the advantange of compatibility and being able to update fields like IF and INCLUDEPICTURE natively (with popups and manual confirmation) but the disadvantage of Microsoft Word installation requirement of course.

About funding that's less of an issue. As this repository is a fork of the original one, which went unmaintained I feel the responsibility to continue the work and keep the package clean and close to the original idea. So a feature will only be added if it fits the goal of this project.

In MS Word one would still need to manually update all fields after doing a mailmerge filling in the INCLUDEPICTURE fields as a second step. So this behaviour is expected. But it will fit well in the other project which makes document workflows automatic.

iulica commented 1 year ago

I have added it to my todo list for the doc-workflow project, I'll leave it open for the moment and let you know when/if it is done.

iulica commented 5 months ago

Hi everyone. I had some free time and created a new package to update docx INCLUDEPICTURE fields. @KapilPatel-KP It has support for base64 images and local and URLs images. It also allows the user to resize the images by x or y or both. @Yenthe666 I hope it helps.

I updated the documentation and added a section for it. The solution is straight-forward, after saving the mailmerge docx you can use this code:

    ! pip install docx-mergefields
    from mergefields import MergeFieldsDocument
    with MergeFieldsDocument('mailmerge.docx') as mailmerge_doc:
        mailmerge_doc.transform_fields()
        mailmerge_doc.doc.save('mailmerge_with_images.docx')
Yenthe666 commented 5 months ago

Thanks! 🙏 We have postponed this task after we didn't find any good solutions but we'll sure give this a try in the next weeks or so :)

iulica commented 5 months ago

auto_update_fields_on_open="no" if you don't have other fields that need updating. Otherwise you'll get the warning when you open the document.