unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.6k stars 254 forks source link

[BUG] Huge memory consumption when writing images to PDF #542

Open zenyui opened 10 months ago

zenyui commented 10 months ago

Description

I am trying to create a PDF from an array of golang image.Image objects. The images are about ~30MB together, and when I write them to the PDF, I observe the docker container spike to 1.4GB memory usage.

In production, this is causing my container to OOM and exit.

See implementation below.

Expected Behavior

I would expect the memory usage to be close to (or 2x, 3x) the size of the images, not 1.4GB! I also don't see a way to incrementally build/finalize the PDF, so I don't see a way to decrease the memory usage.

Actual Behavior

Memory usage is 1.4GB, and I don't see an avenue to accomplish what I'm hoping to do.

Attachments

// pdfFromGoImages creates a pdf from an array of images, each on a separate page
func pdfFromGoImages(ctx context.Context, images ...image.Image) (io.ReadSeeker, error) {
    c := creator.New()

    margins := float64(10)

    for ix, img := range images {
        pImg, err := c.NewImageFromGoImage(img)
        if err != nil {
            return nil, err
        }
        _ = c.NewPage()

        // scale to page width
        pImg.ScaleToWidth(c.Width() - margins*2)
        pImg.SetPos(margins, margins)
        if pImg.Height() >= c.Height() {
            pImg.ScaleToHeight(c.Height() - margins*2)
            pImg.SetPos(margins, margins)
        }
        b := creator.NewBlock(1, 1)
        if err := b.Draw(pImg); err != nil {
            return nil, err
        }
        if err := c.Draw(b); err != nil {
            return nil, err
        }

    }

    var outBytes bytes.Buffer
    writer := bufio.NewWriter(&outBytes)
    if err := c.Write(writer); err != nil {
        return nil, err
    }

    return bytes.NewReader(outBytes.Bytes()), nil
}
github-actions[bot] commented 10 months ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

zenyui commented 10 months ago

FYI, I am a licensed enterprise customer

sampila commented 10 months ago

Hi @zenyui,

Could you share the images that you load into golang image.Image object? so we can reproduce the issue in our ends

zenyui commented 10 months ago

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF
sampila commented 10 months ago

Here is a google drive folder with a few pprof dumps and the source PDF.

The larger algorithm is:

  1. extract the images from the source pdf
  2. convert to golang image.Image and compress it to 75% quality (attempt to make it smaller)
  3. pass into above function to write images to a new PDF

Thanks for the information, we will investigate this issue.

zenyui commented 8 months ago

Still waiting on a solution.

ipod4g commented 2 months ago

@zenyui We have already improved partly PDF creation from images and introduced lazy mode allowing us to reduce memory consumption. you can check it here: https://github.com/unidoc/unipdf-examples/blob/master/image/pdf_images_to_pdf_lazy.go

As for image extraction, we are actively working on that and and we will keep you updated on our progress.