yakovmeister / pdf2image

A utility for converting pdf to image and base64 format.
MIT License
400 stars 139 forks source link

Output buffer is size 0 on AWS lambda #207

Open Gr33nLight opened 1 month ago

Gr33nLight commented 1 month ago

Hello I'm currently developing a function to extract an image of the first page of a pdf. I get the source pdf from s3 and upload the result to s3 as well. I added ghostscript and graphicsmagick layers to the lambda, this is a snippet of my sst config

    nodejs: {
      esbuild: {
        external: ['graphicsmagick', 'ghostscript4js', 'ghostscript-node'],
      },
    },
    layers: [
      new aws_lambda.LayerVersion(stack, 'GraphicsMagickLayer', {
        code: aws_lambda.Code.fromAsset('./packages/functions/layers/gm-layer-1.3.43.zip'),
      }),
      new aws_lambda.LayerVersion(stack, 'GhostScript', {
        code: aws_lambda.Code.fromAsset('./packages/functions/layers/ghostscript.zip'),
      }),
    ],

And this is the lambda code

import { GetObjectCommand, PutObjectCommand, S3Client } from '@aws-sdk/client-s3';
import { APIGatewayProxyEventV2 } from 'aws-lambda';
import { fromBuffer } from 'pdf2pic';
import { Config } from 'sst/node/config';

import { IngestEventProps } from './consumer.js';

const client = new S3Client({ region: 'eu-central-1' });

const processPdf = async (userId: string, fileName: string, pdfBuff: Buffer): Promise<number> => {
  console.log('****** got input buff len: ', Buffer.byteLength(pdfBuff));

  const convert = fromBuffer(pdfBuff, {
    density: 100,
    preserveAspectRatio: true,
    format: 'png',
    width: 595,
    height: 842,
  });

  const pageToConvertAsImage = 1;

  const outBuffer = await convert(pageToConvertAsImage, { responseType: 'buffer' });
  if (outBuffer.buffer) {
    console.log('****** outBuffer **** ', outBuffer);
    console.log('****** out buff **** ', Buffer.byteLength(outBuffer.buffer));
  } else {
    console.error('No buffer generated by pdf2pic');
    return 0;
  }
  console.log(Config.BUCKET_URL);
  console.log(Config.BUCKET_BASE_PATH + `${userId}/${fileName}.png`);

  const command = new PutObjectCommand({
    Bucket: Config.BUCKET_URL,
    Key: Config.BUCKET_BASE_PATH + `${userId}/${fileName}.png`,
    Body: outBuffer.buffer,
    ContentType: 'image/png',
  });
  await client.send(command);

  console.log('after put command');
  if (outBuffer.buffer) {
    return Buffer.byteLength(outBuffer.buffer);
  } else {
    return 0;
  }
};

export async function handler(event: APIGatewayProxyEventV2) {
  console.log('got event');
  console.log(event);

  if (!event.body) {
    console.error('[pdfthumb] No body');
    return;
  }

  const parsedEvent = JSON.parse(event.body) as IngestEventProps;

  const command = new GetObjectCommand({
    Bucket: Config.BUCKET_URL,
    Key: Config.BUCKET_BASE_PATH + `${parsedEvent.userId}/` + parsedEvent.fileName,
  });

  let outBuff = 0;
  try {
    const response = await client.send(command);

    if (!response.Body) {
      console.error('[pdfthumb] getobj s3 - No body');
      return;
    }

    const byteArr = await response.Body.transformToByteArray();
    const buffer = Buffer.from(byteArr);
    outBuff = await processPdf(parsedEvent.userId, parsedEvent.fileName, buffer);
  } catch (error) {
    console.error('[pdfthumb] Unhandled error', error);
  }

  return { buff: outBuff };
}

As you can see from my code I logged both the input and output buffers. The input buffer has the correct value (not 0) but the output buffer length is always 0, what other checks can I do? Thanks!!

mskec commented 1 month ago

Hi @Gr33nLight, usually this can happen if dependencies are not correctly installed. In your case you are adding them in lambda layers. Please check if they are actually available for use. You can try using community layers:

arn:aws:lambda:us-east-1:175033217214:layer:graphicsmagick:2
arn:aws:lambda:us-east-1:764866452798:layer:ghostscript:15

gm-installation docs

Gr33nLight commented 1 month ago

Thanks for the response @mskec . I tried using the public layers but unfortunately it seems it is not configured to be accessible by the aws region eu-south-1 (that I'm using) the error is as follows ( searching this error online seems to confirm this):

UPDATE_FAILED Resource handler returned message: "User: arn:aws:sts::181148086663:assumed-role/cdk-hnb659fds-cfn-exec-role-181148086663-eu-south-1/AWSCloudFormation is not authorized to perform: lambda:GetLayerVersion on resource: arn:aws:lambda:us-east-1:175033217214:layer:graphicsmagick:2 because no resource-based policy allows the lambda:GetLayerVersion action (Service: Lambda, Status Code: 403, Request ID: 696464e9-5a4b-4c5f-ba22-9679ea2861f2)" (RequestToken: b50d852f-bb19-33e8-5f2d-ffea9954bff7, HandlerErrorCode: AccessDenied)

Thats why I think the only way is to use the version i built locally but as I mentioned earlier I get no output it seems, not even error which is weird, just a size 0 buffer (also tried base64). To build the layers I simply run the build scripts and copied the generated files in my infra code, and i can see the lambda layers in the AWS console... Could it maybe be an issue with the build scripts? Do you happen to have the generated zips maybe? So I can check if mine differ in some way

Gr33nLight commented 1 month ago

I will be trying to build an older graphicsmagick version to see if it fixes the issue

mskec commented 1 month ago

I tried building it myself some months ago and published everything in this repo https://github.com/mskec/gm-lambda-layer

Check it out and compare with how you are building them.