open-xml-templating / docxtemplater

Generate docx, pptx, and xlsx from templates (Word, Powerpoint and Excel documents), from Node.js or the browser. Demo: https://www.docxtemplater.com/demo. #docx #office #generator #templating #report #json #generate #generation #template #create #pptx #docx #xlsx #react #vuejs #angularjs #browser #typescript #image #html #table #chart
https://www.docxtemplater.com
Other
3.06k stars 350 forks source link

Generate multiple invoices in one docx, one invoice per page #182

Open aetna-softwares opened 8 years ago

aetna-softwares commented 8 years ago

The current features of this library are great to generate a "single item" output file, such as for example editing an invoice.

I am looking to an enhancement that allow us to work with the same template but give it an array of data and obtain a document containing all items in one file (such as a mailing in word with an xls file).

the idea is to create each individual files and to concatenate them in a final document with a page break between each item.

I try to get this result by using loop syntax with in page-break inside :

 {#invoices} 
... contents ...
[page break here]
{/invoice} 

but it has 2 drawbacks : 1/ from a user point of view the {#invoices} tag is difficult to understand as he is creating a template for a single invoice 2/ as I force a page break at the end of content, I have a white page at the end of generated document

A workaround for this 2 point should be to automatically add the {#invoices} tags to the user template before starting the merge and to remove the last page break from the generated file but I would prefer a clean feature than to do this kind of quick hack ;)

do you think that it could be a good feature for your lib ?

Zulus88 commented 8 years ago

Same problem for me - multiplying invoices ):. As far as I undestand, your syntax tells that '...contents...' consists of all remaining tags with replacement data, am I correct? Do you put {#invoices} tag at top of page 1 and {/invoice} at top of page 2 (after page break)? Or do you put some command for Word when generating output docx?

aetna-softwares commented 8 years ago

hi,

yes '...contents...' is my template of invoice with tags of my invoice data.

the {/invoice} is indeed on the page 2 after the page break.

in this early test, it is a pure usage of the lib "as is" and it works quite well if you do yourself the templates and if you don't care about the white page at the end.

aetna-softwares commented 8 years ago

for your information, here is a quick hack to automatically add the "array" tags with the page break :

var docx = new Docxtemplater(contentsDocx);

            if(Array.isArray(data)){
                Object.keys(docx.zip.files).forEach(function(f){
                    var asTextOrig = docx.zip.files[f].asText ;
                    docx.zip.files[f].asText = function(){
                        var text = asTextOrig.apply(docx.zip.files[f]) ;

                        text = text.replace("<w:body>", "<w:body><w:t>{#pages}</w:t>");
                        text = text.replace("</w:body>", '<w:p w:rsidR="00C7053A" w:rsidRDefault="00C7053A"><w:r><w:rPr><w:lang w:val="fr-FR" /></w:rPr><w:br w:type="page" /></w:r></w:p><w:t>{/pages}</w:t></w:body>');

                        return text
                    }
                }) ;

                data = {pages : data} ;
            }

            docx.setData(data);
            docx.render();

            var buf = docx.getZip().generate({type:"nodebuffer"});

please note that I am note an openXML guru so the breakpage syntax should probably be improved

and the last part of the trick (remove the last page break in resulting file) is not done here.

disclaimer : this is only a quick hack and use some not documented part of docxtemplater so it may break any moment on a version update !

edi9999 commented 8 years ago

Hi,

I am quite nitpicky about integrating new features to docxtemplater. For example, in the past (v0.x), the image module was integrated inside this repository, but it was making the code base more complex and less lightweight (some people use docxtemplater in the browser). Also, it was not easy to add functionality without integrating your code inside the repository (or maintaining a fork).

That's why I have created the concept of modules. Modules makes it possible to hook into events triggered by docxtemplater.

Here's the code of the image-module :

https://github.com/open-xml-templating/docxtemplater-image-module

It would be possible to integrate the feature you want in a new module, for example with the following syntax:

var MultiDocModule=require('docxtemplater-multi-doc-module')

var opts = {}
opts.loopOver = "invoices";
var multiDocModule=new MultiDocModule(opts);

var docx=new DocxGen()
    .attachModule(multiDocModule)
    .load(content)
    .setData({invoices: [  {"customer" : "John Doe", price: "10 $"},  {"customer" : "Jane Doe", price: "20 $"},  {"customer" : "John Doe", price: "10 $"} ] })
    .render()

var buffer= docx
        .getZip()
        .generate({type:"nodebuffer"})

fs.writeFile("test.docx",buffer);

You can either do your module yourself (I can help you for specific question if you do it open-source), or we could agree on a contract (in that case, please contact me via email)

aetna-softwares commented 8 years ago

This is a good concept !

I will certainly try to write this module when i'll have some spare time

andrest commented 8 years ago

@aetna-softwares solution provides what I need, however, it looks like footer and header are not kept in the same loop scope. Any ideas on how to pass along the scope?

edi9999 commented 8 years ago

I think that this should be done from word in your template: you should select something like : "apply header to whole document" as far as I can tell. There is also the possibility to use a header for odd page numbers and a header for even page numbers.

You could also do it programmatically by copying the content header.xml and create the right rel

motleydev commented 8 years ago

@aetna-softwares, did you ever get this turned into a module?

aetna-softwares commented 8 years ago

Hi,

Finally I didn't need it anymore so I didn't take the time to do it.

We ended to generate individual documents and when we need some document aggregation we do it on PDF not at the docx generation time.

motleydev commented 8 years ago

Cool. Not a bad approach.

andrest commented 8 years ago

We're also about to move to what @aetna-softwares described. We've found that with larger documents memory leaks exceed our environment constraints. With 100-200 pages it needs more than 1.5GB RAM.

edi9999 commented 7 years ago

I think with the newest version 3.0.2, they shouldn't be any more memory leaks now.

awerlang commented 7 years ago

@aetna-softwares Are you converting from docx to pdf? Are you using LibreOffice or another tool?

aetna-softwares commented 7 years ago

@awerlang yes LibreOffice give me the best results with the best performances (although performances are not so good but others are worse)

awerlang commented 7 years ago

@aetna-softwares Cool! Thanks for sharing! I was working on a Docker image forked from https://hub.docker.com/r/xcgd/py3o.fusion/, hope to add a node server as well.

edi9999 commented 6 years ago

There is also the idea of join which would allow to add something between each iteration of the loops and not after each item.

{:join (users,pagebreak)} {name} {/join}

Would give the desired output (all user information on one page, but without a blank page at the end).

jdcrecur commented 6 years ago

Does this actually work already, or a proposal for a solution?

Zulus88 commented 6 years ago

Alive & kicking

On Feb 3, 2018 3:41 PM, "John" notifications@github.com wrote:

Does this actually work already?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/open-xml-templating/docxtemplater/issues/182#issuecomment-362797070, or mute the thread https://github.com/notifications/unsubscribe-auth/APKj5ZOX9zw_Epl6xil66IwVRlMJUjNQks5tRDfugaJpZM4GkRNj .

jdcrecur commented 6 years ago

Oh wow nice one!

Is there a page for the documentation on this join logic?

edi9999 commented 6 years ago

The join syntax is a proposition, it is not implemented get. I posted it here for discussion

genachka commented 6 years ago

@edi9999 I'm working with a table loop, that contains details for properties of 5 files that I'm documenting and need to place the filename in the footer, so as the filename changes, the footer of that page needs to show the correct one. If this :join concept will do it, I'm +1 for that! If not, suggestion on how?

edi9999 commented 6 years ago

Hello @genachka , can you open a new issue for this, and also include screenshots or documents of what you have as data and what ouput you want. ? It doesn't seem to be solvable with the join module.

genachka commented 6 years ago

@edi9999 opened as https://github.com/open-xml-templating/docxtemplater/issues/378

jdcrecur commented 6 years ago

Thought i would give this a go this evening, but not making much head way. I can easily inject a page break symbol.. but this still leaves me with a trailing page. I added an index to each page break.. now i only need to remove the last one, but not sure how to access the rendered content to do so.

    //Load the docx file as a binary
    let content = fs.readFileSync(path.resolve(__dirname, sourceFile), 'binary')
    let zip = new JSZip(content)
    let doc = new Docxtemplater()

    // Load the zip container and inject the merge variables
    doc.loadZip(zip)
    doc.setOptions({paragraphLoop: true})

    let data = Object.assign({}, jobData, {subject_loop: reports})
    doc.setData(data)

    // Hack to inject page breaks
    Object.keys(doc.zip.files).forEach((f, index) => {
      let asTextOrig = doc.zip.files[f].asText
      doc.zip.files[f].asText = () => {
        let text = asTextOrig.apply(doc.zip.files[f])
        text = text.replace('<w:t index="'+index+'">{pagebreak}</w:t>', `<w:br w:type="page"/>`)
        return text
      }
    })

    // render the document
    doc.render()

    // Now remove the page with the highest index?
    // Object.keys(doc.zip.files).forEach((f) => {
    //   let asTextOrig = doc.zip.files[f].asText
    //   console.log(asTextOrig())
    // })

    let buf = doc.getZip().generate({type: 'nodebuffer'})
    fs.writeFileSync(path.resolve(__dirname, targetFile), buf)

    return targetFile
edi9999 commented 6 years ago

Hello, the different pages in docx are not stored as separate documents, everything is inside the file /word/document.xml, the paging is not explicit in the document, but is calculated by the rendering engine.

jdcrecur commented 6 years ago

Thanks, I gave this another go this evening.. not too difficult when you know where to look. Here is the code for anyone else stuck in the same boat:

    //Load the docx file as a binary
    let content = fs.readFileSync(path.resolve(__dirname, sourceFile), 'binary')
    let zip = new JSZip(content)
    let doc = new Docxtemplater()

    // Load the zip container and inject the merge variables
    doc.loadZip(zip)
    doc.setOptions({paragraphLoop: true})

    let data = Object.assign({}, jobData, {subject_loop: reports})

    // Set the data to use in the replacement
    doc.setData(data)

    // Hack to inject page breaks by replacing custom placeholder
    Object.keys(doc.zip.files).forEach((f, index) => {
      let asTextOrig = doc.zip.files[f].asText
      doc.zip.files[f].asText = () => {
        let text = asTextOrig.apply(doc.zip.files[f])
        text = text.replace('<w:t>{loop_pagebreak}</w:t>', '<w:br loop-pagebreak="true" w:type="page"/>')
        return text
      }
    })
    doc.render()

    // remove the last pagebreak via cheerio
    const $1 = cheerio.load(doc['zip']['files']['word/document.xml']['_data'], {
      xml: {
        withDomLvl1: true,
        normalizeWhitespace: false,
        xmlMode: true,
        decodeEntities: true
      }
    });
    $1("*[loop-pagebreak]").last().remove()
    doc['zip']['files']['word/document.xml']['_data'] = $1.root().html()

    let buf = doc.getZip().generate({type: 'nodebuffer'})
    fs.writeFileSync(path.resolve(__dirname, targetFile), buf)
dracuten1 commented 4 years ago

{}, jobData, {subject_loop: reports}

can you show me jobdata and reports structure?

Coronelpanter commented 3 years ago

Thank you @edi9999 , this is something that could help me but not in especific how i said i need repeat the same template

collaorodrigo7 commented 3 years ago

I tried the proposed solution from @jdcrecur but did not work for me. It seems like the asText method is not getting executed, and it does not make any changes in my case. (Maybe because its been more than 3 years since then 😅 ) Anyways, I found a solution and I am posting it in case it helps anyone. (Screenshots below) On your docx you can do something like this:

{#dataLoop}
{name}
{@raw_loop_pagebreak}
{/dataLoop}

And then on you doc.setData you can:

doc.setData({
  raw_loop_pagebreak: `<w:br w:type="page"/>`,
  dataLoop: [
    {
      name: "hello",
    },
    {
      name: "hello2",
      raw_loop_pagebreak: "", //overwrite raw_loop_pagebreak here so that the last element does not add a page break
    },
  ],
});

image

edi9999 commented 3 years ago

To avoid having to overwrite the raw_loop_pagebreak in your data, you could also use the {$isLast} trick, documented here :

https://docxtemplater.readthedocs.io/en/latest/configuration.html?highlight=isLast#simple-parser-example-for-index-and-islast-inside-loops