Streaming generation of a large PDFs

itimofeev commented 4 months ago

Is your feature request related to a problem? Please describe. Hello,

I'm currently working at a company that provides trading services to clients. One of our essential requirements is to generate monthly reports containing all the deals for each client. In some cases, a single client can have as many as 100,000 deals within a month. The issue we are facing is that the Maroto library can only render the entire PDF with all the data at once. Consequently, our services consume a significant amount of memory as they need to load all the deals into memory, build the PDF in memory, and only then generate a compressed PDF, which consumes far less memory than all the temporary data structures combined.

Describe the solution you'd like Upon reviewing the current code, I couldn't find support for streaming generation. Could you please advise if there are any options or methods to optimize memory consumption in this scenario?

Describe alternatives you've considered One approach we have considered is rendering each page separately and then utilizing a third-party Golang library to merge these individual pages into a single PDF file. We would greatly appreciate any guidance or suggestions on how to address this memory consumption issue effectively. Thank you.

johnfercher commented 4 months ago

Hello @itimofeev, how are you? I have some questions to your issue.

Are you using the v2?
Maroto now has a feature to merge PDFs.
- You could generate the files separated and merge them, but I see a problem with page number counting that we would have to deal with it.

With that said, I will see a way to improve this. @F-Amaral such a nice challenge here.

johnfercher commented 4 months ago

I'm thinking here. Maybe you could try the parallelism feature. Since with this, are created small PDFs an them they are merged. Since maroto now have a clear division between declaration phase and computing phase, I think that this can help you.

itimofeev commented 4 months ago

Hello @johnfercher,

Thank you for your prompt response! We have recently upgraded and are now using Maroto v2. I appreciate your pointing out the PDF merge feature; we hadn’t noticed it before. We will look into the page numbering issue more closely as we explore this feature.

Regarding the parallelism feature, I must admit I'm not entirely clear on how to implement it effectively. It seems we might need to put in some additional effort to understand and utilize this feature properly.

I want to take a moment to express my gratitude for your work on Maroto. It’s an excellent library that demonstrates high coding standards, and it has been instrumental in our projects.

Also, if you're thinking about adding page streaming to Maroto, I'd love to be a part of that. I'm ready and willing to help out with the coding if you need it.

Thanks again for your dedication and support in maintaining Maroto.

Best regards, Ilia

johnfercher commented 4 months ago

First of all, thank you :D

To use the parallel generation you should only define the WithWorkerPoolSize() in the builder. Is possible that it will use less memory, if not, you could try to generate different documents and merge them.

If you follow the path to generate different documents and merge them, please let me know. This may be an easy way to implement a less memory consumption algorithm. To achieve this we should only apply this part sequential instead of concurrently.

lordofscripts commented 2 weeks ago

When I read the Maroto docs I realized it wouldn't scale well memory-wise, especially for large data. In my project I deal with a large JSON file. First I pre-parse it to create & resolve relationships, and in the 2nd pass I process it piecewise, generating it on the go.

johnfercher commented 1 week ago

We achieved a low memory mode which keeps the memory allocation lower, it keeps 13% less allocations and don´t increase overtime. However, we should focus more in this.

johnfercher / maroto

Streaming generation of a large PDFs #392