unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.46k stars 250 forks source link

[BUG] Huge memory consumption when writing images to PDF #542

Open zenyui opened 6 months ago

zenyui commented 6 months ago

Description

I am trying to create a PDF from an array of golang image.Image objects. The images are about ~30MB together, and when I write them to the PDF, I observe the docker container spike to 1.4GB memory usage.

In production, this is causing my container to OOM and exit.

See implementation below.

Expected Behavior

I would expect the memory usage to be close to (or 2x, 3x) the size of the images, not 1.4GB! I also don't see a way to incrementally build/finalize the PDF, so I don't see a way to decrease the memory usage.

Actual Behavior

Memory usage is 1.4GB, and I don't see an avenue to accomplish what I'm hoping to do.

Attachments

// pdfFromGoImages creates a pdf from an array of images, each on a separate page
func pdfFromGoImages(ctx context.Context, images ...image.Image) (io.ReadSeeker, error) {
    c := creator.New()

    margins := float64(10)

    for ix, img := range images {
        pImg, err := c.NewImageFromGoImage(img)
        if err != nil {
            return nil, err
        }
        _ = c.NewPage()

        // scale to page width
        pImg.ScaleToWidth(c.Width() - margins*2)
        pImg.SetPos(margins, margins)
        if pImg.Height() >= c.Height() {
            pImg.ScaleToHeight(c.Height() - margins*2)
            pImg.SetPos(margins, margins)
        }
        b := creator.NewBlock(1, 1)
        if err := b.Draw(pImg); err != nil {
            return nil, err
        }
        if err := c.Draw(b); err != nil {
            return nil, err
        }

    }

    var outBytes bytes.Buffer
    writer := bufio.NewWriter(&outBytes)
    if err := c.Write(writer); err != nil {
        return nil, err
    }

    return bytes.NewReader(outBytes.Bytes()), nil
}
github-actions[bot] commented 6 months ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

zenyui commented 6 months ago

FYI, I am a licensed enterprise customer

sampila commented 6 months ago

Hi @zenyui,

Could you share the images that you load into golang image.Image object? so we can reproduce the issue in our ends

zenyui commented 6 months ago

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF
sampila commented 6 months ago

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF

Thanks for the information, we will investigate this issue.

zenyui commented 4 months ago

Still waiting on a solution.