julianhille / MuhammaraJS

Muhammara a node module with c/cpp bindings to modify PDF with js for node or electron (based/replacement on/of galkhana/hummusjs)
Other
205 stars 43 forks source link

Memory heap size shoots up beyond 1.5 GB for 50MB pdf file #324

Open ladvishal1985 opened 10 months ago

ladvishal1985 commented 10 months ago

Thanks for such a great library. We are able to reliably able to write watermark on the PDF. But we are facing trouble with the memory consumption. This is not allowing to use this library for our node servers since this issue is resulting in terminating the pods. For example:

// We are getting file via signed url pass the response as below: 
const recipe = new Recipe(fileBuffer); 
const used = process.memoryUsage().heapUsed / 1024 / 1024;
this.logger.log(`The script uses approximately after recipe read ${Math.round(used * 100) / 100} MB`);
//Creating the new recipe shoots up the memory heap size above 1.5 GB and 
//After creating the stream 
const readerStream = new Muhammara.PDFRStreamForBuffer(fileBuffer);
// The memory size shoots up to almost 3 GB. 
// Then we use below 
const reader = Muhammara.createReader(readerStream); // We need this to get the total page count.
const pageCount =  reader.getPagesCount();

Is there any solution to this problem ? Currently we are targeting a file size of up to 50 mb and may go upto 100mb.

julianhille commented 10 months ago

There are some Infos missing.

About the 3GB: 3GBis what i would expect if (!!) recipe shots to 1,5 GB it, as recipe is just using muhammara under the hood and node does not free the memory between recipe = ... and readerStream = ... and both create their own objects from the buffer. It couldn't free any memory as recipe is still in used and not dereferenced. So there is that. :>

are you able to provide a sample file?

ladvishal1985 commented 10 months ago

Check the below snippet

async downloadAndAddwatermark(signedUrl: string, waterMark: string) {
    try {
      const file$ = this.downloadFileUsingSignedUrl(signedUrl);
      const fileBuffer = await firstValueFrom(file$.pipe(take(1))); //<-- Download the file from here as array buffer
      const modifiedBuffer = await this.addWatermark(fileBuffer, waterMark);
      return modifiedBuffer;
    } catch (error) {
      // catch error here
    }
  }

  private addWatermark(fileBuffer, waterMark: string) {
    try {
      const reciepe = new Recipe(fileBuffer); // <-- Memory consumption increases after this.
      const readerStream = new Muhammara.PDFRStreamForBuffer(fileBuffer);
      const reader = Muhammara.createReader(readerStream);
      const pageCount = reader.getPagesCount();

      const modifiedReciepe = this.addWatermarkPage(reciepe, {
        currentPage: 1,
        watermark: waterMark,
        pageCount
      });

      return modifiedReciepe.endPDF((outputBuffer) => outputBuffer);
    } catch (error) {
      //catch error here
    }
  }
  private addWatermarkPage(recipe: Recipe, { currentPage, watermark, pageCount }) {
    if (currentPage > pageCount) {
      return recipe;
    }
    const pgWidth = recipe.pageInfo(currentPage).width;
    const pgHeight = recipe.pageInfo(currentPage).height;
    const initialConfig: FileBufferEditConfig = {
      size: 20,
      text: watermark,
      width: pgWidth,
      x: 0
    };
    const textDetails = this.getTextDetails(initialConfig); // Gets inital config object for text
    const newRecipe = recipe
      .editPage(currentPage)
      .text(watermark, textDetails.x, pgHeight - 30, textDetails.textOptions)
      .text(watermark, textDetails.x, 30, textDetails.textOptions)
      .endPage();

      return this.addWatermarkPage(newRecipe, {
      currentPage: currentPage + 1, 
      watermark: watermark,
      pageCount
    });
  }
private getTextDetails(options: FileBufferEditConfig) {
    const writer = Muhammara.createWriter(new Muhammara.PDFWStreamForBuffer());
    const fontFile = path.join(this.fontPath, 'Helvetica.ttf');
    const fontObject = writer.getFontForFile(fontFile);
    let textWidth = fontObject.calculateTextDimensions(options.text, options.size).width;
    while (textWidth >= options.width - 20) {
      options.size = options.size - 1;
      textWidth = fontObject.calculateTextDimensions(options.text, options.size).width;
    }
    options.x = options.width / 2 - textWidth / 2;
    const textOptions = {
      font: 'Helvetica',
      size: options.size,
      colorspace: "rgb",
      color: '#F21A1A',
      opacity: 0.5,
    };
    return {
      textOptions: textOptions,
      x: options.x
    };
  }
ladvishal1985 commented 10 months ago

@julianhille: Provided the sample here.

julianhille commented 10 months ago

if files are that huge, most of the time the file is, even if temporary, stored on disk. please check if possible to use new muhammara.PDFRStreamForFile('./huge.pdf'); this could possibly reduce the memory usage greatly

julianhille commented 10 months ago

You may also have a look at CopyingContext that also might help reduce

julianhille commented 9 months ago

Did you solve it? Do you had a chance to look at copying context?

ladvishal1985 commented 9 months ago

No We did not got a chance to use copying context. Any example might help us. Currently we solved the issue by writing file to disc and modifying it. This has helped us to work our solution reasonably well. This is how we do it.

  const pageCount = reader.getPagesCount();
  const fontObject = writer.getFontForFile(this.fontFile);
  const xobjectForm  = writer.createFormXObjectsFromPDF(source, Muhammara.ePDFPageBoxMediaBox);

.....

 pageContent
          .doXObject(page.getResourcesDictionary().addFormXObjectMapping(xobjectForm[i] as any))
          .writeText(watermark, config.x, yTop, textOptions)
          .writeText(watermark, config.x, yBottom, textOptions)
          .Q();
        writer.writePage(page);
ladvishal1985 commented 9 months ago

You close this issue..