rkusa / pdfjs

A Portable Document Format (PDF) generation library targeting both the server- and client-side.
MIT License
773 stars 143 forks source link

Specify page dimension / orientation when merging external docs #111

Open jpbourgeon opened 6 years ago

jpbourgeon commented 6 years ago

Hello

When I merge a PDF file with different page size, pages that are not in the original orientation (paysage/portrait) are truncated.

Do you have a suggestion on how I could get the template page dimensions and use them for a specific page ?

rkusa commented 6 years ago

Good point. It should be fairly easy to implement to always add pages from external documents in their original size.

In https://github.com/rkusa/pdfjs/blob/master/lib/external.js#L37

However, I am now not sure what the default behavior should be and whether there should be an option for it (and if so, what to use as the default behavior). TBH, for my own use cases, I mostly set external PDFs as page page templates and I am expecting them to be truncated to the size of my document. But I see that this does not make much sense when appending external PDF pages...

My initial guess would be

What would be your opinion?

jpbourgeon commented 6 years ago

In my opinion, you should focus on the consistency of your API. You should stick with the current behaviour for all methods, and add an optional parameter to specify explicitly that the page format of the original page should be preserved. This is a minor patch, whereas changing the behaviour of the methods would be a breaking change that could impact other developer's code.

Besides that, I personnaly use setTemplate in my code since I still want to add content on top of the imported page (footer, TOC targets, etc.) and want to preserve the aspect ratio. I cannot do that with addPageOf which skips to the next page. Your proposition wouldn't work in my specific case.

jpbourgeon commented 5 years ago

Hi I'm curious to know if and what you decided to do on that topic ? ;-)

rkusa commented 5 years ago

I did some testing and it seems most PDFs are already added in their respective size, and only some are not. It depends on whether the Page itself specifies a MediaBox or not. So I would consider it a bug, when pages added via addPageOf have a wrong size. That is, I think we should

jpbourgeon commented 5 years ago

Brilliant

This way feels very coherent with the current behaviour of the library

rkusa commented 5 years ago

Pages added with addPageOf should now (on master) keep their original page size.

Though, I am not sure whether I am going to implement

add an option to setTemplate to adopt the page size of the external document

Because this leads to a lot of edge cases that have to be handled. E.g. what if the template size is so small that there is no space for content with the current document's padding setting.

How do you use setTemplate? In my case, I always want to output A4 pages no matter the template size a user provides.

jpbourgeon commented 5 years ago

How does the library already handle the case of putting over sized content in under sized pages ? In other words, I think that it is not the responsibility of setTemplate to handle the content that will be imported into the page after it successfully created the page from its input.

If you consider the case without setTemplate: when you have too much content for a standard A4 page, or a font size so big that it doesn't fit into the page, how does the library handle it right now ? I guess it truncates the content or fails. The same will apply with a too small page from setTemplate. Which is a coherent response of the system.

Generally speaking, since you will add the feature to adopt the page size of the external document as an option to setTemplate, I think it is only natural to assume that the developer that activates this option handles its constraints properly. That is, the dev makes sure that the content fits into the page, or handles the failure when it's not the case.


In my own case, I build a PDF aggregator. The aggregator merges PDF files from a folders tree and can optionally : set a cover page from an external document, add a footer and/or page numbers, generate a TOC, generate a changelog based on the merged PDF files creation date.

Since I need to add content on each imported page (footer, page numbers, outline entries for the TOC and the changelog, etc.), I cannot use addPageOf which skips to the next page after merging. I use setTemplate instead.

My tool is mainly used in an administrative context, so I guess it will mostly work with A5-A4-A3 pages. However, I expect it to fail in the case of over sized content in under sized pages.

rkusa commented 5 years ago

Hi @jpbourgeon,

in your case setTemplate will "fail" (in terms of not giving the expected result) for both under sized and over sized content. Though I think setTemplate is maybe not the best for your use case, since it only takes the first page of an document and also does not automatically insert page breaks. That is, you would have to manually call setTemplate and a pageBreak for each page in an extern document.

Your use case would require a mix of setTemplate and addPagesOf - Not sure yet, how the API could look like for that 🤔