open-xml-templating / docxtemplater

Generate docx, pptx, and xlsx from templates (Word, Powerpoint and Excel documents), from Node.js or the browser. Demo: https://www.docxtemplater.com/demo. #docx #office #generator #templating #report #json #generate #generation #template #create #pptx #docx #xlsx #react #vuejs #angularjs #browser #typescript #image #html #table #chart
https://www.docxtemplater.com
Other
3.08k stars 351 forks source link

docxtemplater 4 roadmap #340

Open edi9999 opened 7 years ago

edi9999 commented 7 years ago

+

1. remove setData(data) and resolveData(),

It is now possible to do render(data) and renderAsync(data)

2. Multiple render calls # [POSTPONED]

Make it possible to call render multiple times, each returning a different JSZip instance :

const zip1 = doc.render({first_name: 1});
const zip2 = doc.render({first_name: 2});

Currently, calling render multiple times is not allowed, and will result in an error since version 3.30.2

Ideally, it would be possible to call render several times with different data.

To do this, we need to cache all compiled parts (this should be done already).

We would also need to cache all xmlDocuments parts before the rendering.

We would also need to be able to revert all zip operations (for example the image module will do this.zip.file(newImagePath, imageContent)

As this is quite complex, to do, I'm really not sure that this will be included in docxtemplater 4.

3. Reorder zip files when creating it via render [POSTPONED]

const zip1 = new JSZip();
const files = doc.render().file(/./);
files.sort((function (a1, a2) {
    return a1.name > a2.name ? 1 : -1;
}))

files.forEach(function (file) {
    zip1.file(file.name, file._data, {createFolder: true})
})

const buffer = zip1.generate({type: "nodebuffer", compression: "DEFLATE"});

4. Replace render by renderAsync

That returns a promise, that allows data to have promises too. , This has been done in 3.5.0 with resolveData

5. Use another test runner [POSTPONED]

Jest / ava ? Finding a way to have tests run faster would be cool. First we would need to know for sure what takes most time, is it IO for reading the expected/actual docx, is it CPU for zipping/unzipping the docx ?

6. Make all modules optional [non-breaking-change][optional]

                allowUnopenedTag?: "Hello }blabla",
                allowUnclosedTag?: "Hello {foo"
                changeDelimiterPrefix?: string | null;

                disableRawXml: true,
                disableLoops: true,
                disableDelimiterChange: true,

                rawXmlPrefix : "!!",
                loopPrefix: ["if ", "end"],
                dashPrefix: ["-", "/"],

To make it possible to disable loops, rawxml.

This won't however result in a smaller build (for the browser).

7. Add an official inspect module

that allows to debug the docx, and provides some utility function like getTags()

8. Make option : {linebreaks: true} the default

9. Make option {paragraphLoops: true} the default

10. Remove {tag:p} in following call in postparse, or pass this same value in the scopeManager call.

```
try {
    this.parser(tag, { tag: p });
} catch (rootError) {
    errors.push(getScopeCompilationError({ tag, rootError }));
}
```

11. Remove .compile method

(since v4 constructor automatically compiles the doc).

12. Remove .attachModule method

and put it in the constructor of Docxtemplater (modules key). A question that needs to be solved with this approach is how to handle conditional modules depending of filetype, which are currently handled like this :

    if (doc.fileType === "pptx") {
        doc.attachModule(new TableModule.GridPptx());
        doc.attachModule(new SlidesModule());
    }

=> This has been implemented in https://github.com/open-xml-templating/docxtemplater/pull/501

~~# 13. Require the use of the pizzip module

(jszip fork intended to be sync-only)~~

14. Remove outdated methods

attachModule, loadZip, setOptions, compile methods since they are now all done within the v4 constructor.

~~# 15. Add proofstate module by default [Added in v 3.17.2]

https://docxtemplater.readthedocs.io/en/latest/faq.html#remove-proofstate-tag ? To think about.~~

16. Remove unused events for modules

For example, module.set({compiled: compiled}) is currently called before the compilation, thus it always equals to {} which makes no sense.

17. Use <a:p> for rawTag instead of <p:sp>

see https://github.com/open-xml-templating/docxtemplater/issues/622

18. Remove the internal property "resolveOffset"

of scope manager which is no more used.

19. Remove the getTraits API

which is probably overkill because it seems to be used only for the "expandPair" feature.

20. Remove the getFullText

method which was just used as an internal utility function

21. Use the fixDocPrCorruption module by default :

Currently, one has to do :

const fixDocPrCorruption = require("docxtemplater/js/modules/fix-doc-pr-corruption.js");
const doc = new Docxtemplater(zip, { modules: [fixDocPrCorruption] });

22. Verify module API and add "docxtemplater" to each modules.

23 Create "lite-angular-parser" that is enabled by default.

frederikbosch commented 7 years ago

I would skip renderAsync. Let render always return a promise. With the upcoming await syntax people can make it behave synchronous themselves.

const zip1 = await doc.render({first_name: 1});
edi9999 commented 7 years ago

Yes, the idea was to have two methods, render for synchronous render and renderAsync.

I'm not 100% convinced that it is good to have only async methods, because it hurts performance, especially on CPU intensive tasks (and docxtemplater is only CPU bound), because the javascript VM has to switch tasks very often and loses some optimizations.

See https://github.com/Stuk/jszip/issues/281 for a big discussion about the advantages of keeping a sync function.

bunnyvishal6 commented 7 years ago

Please consider getTags method in docxtemplater class.

edi9999 commented 7 years ago

I don't think I will be adding a method getTags to docxtemplater itself.

I would like to keep the core of docxtemplater as light as possible.

I think I could create a inspector / debugger module that would contain the logic to do inspectModule.getTags()

Same could be for all modules that are included in the core, like the loopmodule and rawxmlmodule

bunnyvishal6 commented 7 years ago

@edi9999 oh I got it.

dashcraft commented 7 years ago

Interestingly enough, i was able to make a little plugin/service with angular 4 (updating to 5) that allowed me to generate multiple documents on the fly. I may create a ng-docxtemplater, if i have the time and it's alright with you.

edi9999 commented 6 years ago

It is now possible to get the tags with the builtin inspectModule :

http://docxtemplater.readthedocs.io/en/latest/faq.html#get-list-of-placeholders

edi9999 commented 6 years ago

cc @bunnyvishal6

edi9999 commented 6 years ago

It is now possible to resolve tags asynchronously : http://docxtemplater.readthedocs.io/en/latest/async.html

alonrbar commented 6 years ago

Hi,

First of all thanks for a very useful library!

I'm really expecting for "8. Auto insert newlines when using \n in the input" is there any chance it can happen sooner, in v3.* instead of v4 ?

I don't mind adding it myself if you can instruct me for the general direction, I have tried to add it my self but wasn't very successful in understanding where it should be done.

edi9999 commented 6 years ago

It is possible with the v3, but it is dirty :

See this comment :

https://github.com/open-xml-templating/docxtemplater/issues/144#issuecomment-298208980

Edit :

You now can do this :

const doc = new Docxtemplater(zip, {linebreaks: true});
doc.render({text: "My text,\nmultiline"});

https://docxtemplater.readthedocs.io/en/latest/configuration.html#linebreaks

alonrbar commented 6 years ago

Thanks. I'll have to consider the pros and cons. Any estimation on v4 release?

edi9999 commented 6 years ago

I would say probably during 2019, but it is not decided yet.

manere commented 5 years ago

Please consider getTags method in docxtemplater class.

Just use something like var tags = String(docxInstance.getFullText()).match(/{[\w,.]{1,100}}/g)

Works like a charm

edi9999 commented 5 years ago

@manere , to get the list of tags it is recommended to use the following : https://docxtemplater.readthedocs.io/en/latest/faq.html#get-list-of-placeholders

henrihietala commented 4 years ago

Is it possible to remove complete slides from pptx using conditions? For example if I want to include certain slides for only specific group of people.

edi9999 commented 4 years ago

Yes, it is possible with the slides module, see https://docxtemplater.com/modules/slides/

The syntax {:users} means to duplicate a given slide for each element in an iterable.

It can also be used with boolean values to simply keep the slide or remove it.

wcordelo commented 1 year ago

Are there updates on the docxtemplater 4 roadmap ?

edi9999 commented 1 year ago

Hello @wcordelo , there is no currently set date for this, are you awaiting for anything special in the next feature ?

The major version is mostly hit to allow to simplify the API and thus to have some breaking changes.

wcordelo commented 1 year ago

@edi9999 I'm wondering if there are limitations with using docxtemplater with cloud functions (AWS Lambda, GCP Cloud Functions, Azure Functions, etc.) regarding memory/CPU usage. The memory/CPU storage can be increased for cloud functions, so I'd like to know if there are limitations we should be aware of (e.g. memory should be at least 256 MB). In addition, cloud functions usually run asynchronously, so I'd like to know if there are limitations that require adopting synchronous processes.

edi9999 commented 1 year ago

Hello @wcordelo , please create an other issue next time, I forgot to respond here.

The memory usage depends on the size of the documents.

The rule of thumb would be : use twice the RAM of the size of the documents you proceed, plus some little extra. So I would probably use a factor of 2.3. So if your document is 40MB big, use 2.3 * 40 = 92MB of RAM at least, so 256MB should be plenty enough.

As for CPU usage, docxtemplater is mostly CPU bound so it should work well with a slow CPU but it will of course be slow, and the faster the CPU the faster the generation will be.

For asynchonous, docxtemplater is mostly CPU bound (first the unzipping process is mostly decoding, then strings are splitted, parsed, replaced, then concatenated), so it actually runs almost entirely synchronously. The only part that can be made async is the resolving of the data : see here : https://docxtemplater.com/docs/async/

However, users of docxtemplater and of the paid versions are using AWS Lambda or Azure functions in production without any issue.