unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.5k stars 249 forks source link

[BUG] Huge ram usage when splitting a 2.5MB pdf #380

Closed Kliton closed 4 years ago

Kliton commented 4 years ago

Description

When i'm going to split a pdf with 166 pages ( 2.5 MB ) into one pdf for each page i'm going to use 8gb of ram.

Is this normal?

Attachments

Schermata 2020-06-22 alle 18 26 47

image

github-actions[bot] commented 4 years ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

gunnsth commented 4 years ago

@Kliton Can you contribute a self-contained repro? I.e. the code and files needed to reproduce the issue?

Kliton commented 4 years ago

@gunnsth Hi, i can't post the pdf used because it contains sensitive data of the company. I can tell you that is a pdf with a lot of tables.

The code is the same as the split example of this repo.

I have tried my function with 3 different pdfs ( also big, like 20MB ) and i have reached at max 100mb of ram.

The problem is with this particular pdf ( i think that it has a lot of "objects" inside ) like table cells ecc.

image

gunnsth commented 4 years ago

OK. That information is not specific enough, we need something to reproduce and work with.

Make sure you use NewPdfReaderLazy when splitting to avoid loading the entire file into memory, which you probably don't want when splitting out only certain pages.

Kliton commented 4 years ago

Yeah i've alread tried with reader lazy.

The main problem is the pdf itself: each page contains like 1800 objects. I have exported the pdf with acrobe premium as "otpimized" and now it has only 22 object.

There is any way with unipdf to handle this pdf with a large amount of objects?

Kliton commented 4 years ago

The problem was in the pdf and not in the lib ( the pdf has 1.8k objects)