rcastoro / PDFImagine

NodeJS project to convert PDF's to Images via AWS Lambda using S3 bucket.
15 stars 7 forks source link

Gs not working #1

Closed jorge-acosta-abstracta closed 2 years ago

jorge-acosta-abstracta commented 3 years ago

We cloned the repository, we specified our bucket and created the folders following the video.

Nevertheless, the lambda function times out when invoking gs, the layer was properly configured and gs -h returns without issues.

we are using Node 12 in our lambda function.

could you point us in the right direction or confirm if this is a bug ?

rcastoro commented 3 years ago

This sounds like a permission issue. Did you configure your own Linux layer with ghostscript installed, or did you use the one provided in the readme section?

Also, did you update the code to point to your S3 bucket path, the code is configured with mine "s3convertpdf".

What logs are you getting in AWS>Cloud Watch>Log Groups> for your lambda function?

jorge-acosta-abstracta commented 3 years ago

Dear Roco,

Thanks for your quick response, these are the detailed steps I followed while trying to use your project which suits exactly the use case we need to implement.

  1. Go to https://us-east-2.console.aws.amazon.com/lambda -> Create Function:
  1. IAM Role:

{ "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Principal": { "Service": [ "lambda.amazonaws.com", "apigateway.amazonaws.com" ] }, "Action": "sts:AssumeRole" } ] }

  1. Layers:

According to vladgolubev Ghostscript v9.52 us-east-2 is arn:aws:lambda:us-east-2:764866452798:layer:ghostscript:8

  1. On your code I've made the following changes
const Bucket = 'myproject.development';
console.log('bucketObjs :::::: ' + JSON.stringify(bucketObjs)); //list of filenames within the folder /pdfs

async function getFileBase64(object) {
    let params = {
        Bucket: 'myproject.development/pdfs',
        Key: object
    };

    var file;
    var base64Str;

    file = s3.getObject(params).createReadStream()
        .pipe(ss('base64'));

    file.on('data', data => base64Str += data);
    return new Promise(function(resolve) {
        file.on('end', () => resolve(base64Str));
    });
}

const operate = async(body, fileName) => { 
....
   console.log('ls ::::' + JSON.stringify(await exec('ls -lrt /tmp/*')));
   console.log('BEFORE ghostScriptPDF :::: ' + fileName);
   await ghostScriptPDF(fileName);
   console.log("done with ghostscript");
....
}

const putfile = async(buffer, fileName) => {
    let params = {
        Bucket: 'myproject.development',
        Key: 'images/' + fileName,
        Body: buffer
    };
    return await s3.putObject(params).promise();
}
  1. gs -h result:

GPL Ghostscript 9.52 (2020-03-19) Copyright (C) 2020 Artifex Software, Inc. All rights reserved. Usage: gs [switches] [file1.ps file2.ps ...] Most frequently used switches: (you can use # in place of =) -dNOPAUSE no pause after page | -q `quiet', fewer messages -gx page size in pixels | -r pixels/inch resolution -sDEVICE= select device | -dBATCH exit after last file -sOutputFile= select output file: - for stdout, |command for pipe, embed %d or %ld for page # Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PostScriptLevel3 PDF Default output device: bbox Available devices: alc1900 alc2000 alc4000 alc4100 alc8500 alc8600 alc9100 ap3250 appledmp atx23 atx24 atx38 bbox bit bitcmyk bitrgb bitrgbtags bj10e bj10v bj10vh bj200 bjc600 bjc800 bjc880j bjccmyk bjccolor bjcgray bjcmono bmp16 bmp16m bmp256 bmp32b bmpgray bmpmono bmpsep1 bmpsep8 ccr cdeskjet cdj1600 cdj500 cdj550 cdj670 cdj850 cdj880 cdj890 cdj970 cdjcolor cdjmono cdnj500 cfax chp2200 cif cljet5 cljet5c cljet5pr coslw2p coslwxl declj250 deskjet devicen dfaxhigh dfaxlow display dj505j djet500 djet500c dl2100 dnj650c epl2050 epl2050p epl2120 epl2500 epl2750 epl5800 epl5900 epl6100 epl6200 eplcolor eplmono eps2write eps9high eps9mid epson epsonc escp escpage faxg3 faxg32d faxg4 fmlbp fmpr fpng fs600 gdi hl1240 hl1250 hl7x0 hpdj1120c hpdj310 hpdj320 hpdj340 hpdj400 hpdj500 hpdj500c hpdj510 hpdj520 hpdj540 hpdj550c hpdj560c hpdj600 hpdj660c hpdj670c hpdj680c hpdj690c hpdj850c hpdj855c hpdj870c hpdj890c hpdjplus hpdjportable ibmpro ijs imagen inferno ink_cov inkcov itk24i itk38 iwhi iwlo iwlq jetp3852 jj100 jpeg jpegcmyk jpeggray la50 la70 la75 la75plus laserjet lbp310 lbp320 lbp8 lex2050 lex3200 lex5700 lex7000 lips2p lips3 lips4 lips4v lj250 lj3100sw lj4dith lj4dithp lj5gray lj5mono ljet2p ljet3 ljet3d ljet4 ljet4d ljet4pjl ljetplus ln03 lp1800 lp1900 lp2000 lp2200 lp2400 lp2500 lp2563 lp3000c lp7500 lp7700 lp7900 lp8000 lp8000c lp8100 lp8200c lp8300c lp8300f lp8400f lp8500c lp8600 lp8600f lp8700 lp8800c lp8900 lp9000b lp9000c lp9100 lp9200b lp9200c lp9300 lp9400 lp9500c lp9600 lp9600s lp9800c lps4500 lps6500 lq850 lxm3200 lxm5700m m8510 md1xMono md2k md50Eco md50Mono md5k mgr4 mgr8 mgrgray2 mgrgray4 mgrgray8 mgrmono miff24 mj500c mj6000c mj700v2c mj8000c ml600 necp6 npdl nullpage oce9050 oki182 oki4w okiibm oprp opvp paintjet pam pamcmyk32 pamcmyk4 pbm pbmraw pcl3 pclm pcx16 pcx24b pcx256 pcxcmyk pcxgray pcxmono pdfimage24 pdfimage32 pdfimage8 pdfwrite pgm pgmraw pgnm pgnmraw photoex picty180 pj pjetxl pjxl pjxl300 pkm pkmraw pksm pksmraw plan plan9bm planc plang plank planm plib plibc plibg plibk plibm png16 png16m png256 png48 pngalpha pnggray pngmono pngmonod pnm pnmraw ppm ppmraw pr1000 pr1000_4 pr150 pr201 ps2write psdcmyk psdcmyk16 psdcmykog psdrgb psdrgb16 pxlcolor pxlmono r4081 rinkj rpdl samsunggdi sj48 spotcmyk st800 stcolor t4693d2 t4693d4 t4693d8 tek4696 tiff12nc tiff24nc tiff32nc tiff48nc tiff64nc tiffcrle tiffg3 tiffg32d tiffg4 tiffgray tifflzw tiffpack tiffscaled tiffscaled24 tiffscaled32 tiffscaled4 tiffscaled8 tiffsep tiffsep1 txtwrite uniprint xcf xes xpswrite Search path: %rom%Resource/Init/ : %rom%lib/ : /usr/local/share/ghostscript/9.52/Resource/Init : /usr/local/share/ghostscript/9.52/lib : /usr/local/share/ghostscript/9.52/Resource/Font : /usr/local/share/ghostscript/fonts : /usr/local/share/fonts/default/ghostscript : /usr/local/share/fonts/default/Type1 : /usr/local/share/fonts/default/TrueType : /usr/lib/DPS/outline/base : /usr/openwin/lib/X11/fonts/Type1 : /usr/openwin/lib/X11/fonts/TrueType Initialization files are compiled into the executable. For more information, see /usr/local/share/doc/ghostscript/9.52/Use.htm. Please report bugs to bugs.ghostscript.com.

  1. Lambda Memory: 256 mb Lambda Timeout: 30s

  2. Lambda -> Function Code -> Actions -> Upload a zip file

Compress index.js , package.json, package-lock.json and node_modules folder into a .zip file

The deployment package of your Lambda function "AI-Textract-PDF2PNG-lambda" is too large to enable inline code editing. However, you can still invoke your function.

  1. What logs are you getting in AWS>Cloud Watch>Log Groups> for your lambda function?

START RequestId: 567e454f-612b-4da7-9ec6-a496c0a3e93c Version: $LATEST 2021-01-18T04:27:11.555Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO bucketObjs :::::: ["NordVPNInvoice.pdf"] 2021-01-18T04:27:11.555Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO obj :::::: NordVPNInvoice.pdf 2021-01-18T04:27:11.754Z 567e454f-612b-4da7-9ec6-a496c0a3e93c ERROR (node:8) [DEP0005] DeprecationWarning: Buffer() is deprecated due to security and usability issues. Please use the Buffer.alloc(), Buffer.allocUnsafe(), or Buffer.from() methods instead. 2021-01-18T04:27:11.795Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO ls ::::{"stdout":"-rw-rw-r-- 1 sbx_user1051 990 22606 Jan 18 04:27 /tmp/inputFile.pdf\n","stderr":""} 2021-01-18T04:27:11.795Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO BEFORE ghostScriptPDF :::: NordVPNInvoice.pdf 2021-01-18T04:27:14.173Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO done with ghostscript 2021-01-18T04:27:14.334Z 567e454f-612b-4da7-9ec6-a496c0a3e93c INFO Finished! END RequestId: 567e454f-612b-4da7-9ec6-a496c0a3e93c REPORT RequestId: 567e454f-612b-4da7-9ec6-a496c0a3e93c Duration: 3156.29 ms Billed Duration: 3157 ms Memory Size: 256 MB Max Memory Used: 129 MB Init Duration: 492.95 ms

Kindly help us to solve this issue, if you believe sharing our project would be useful, or if you believe you can send us an offer to have your help for a couple of hours, we are open to it.

Have a great day, and thanks for sharing knowledge with the community.

Jorge

jorge-acosta-abstracta commented 3 years ago

Updated log

2021-01-18T04:34:25.402Z f5d9dd41-c8e2-4fb8-9326-aeac41709374 INFO BEFORE ghostScriptPDF :::: segip UMBERTO.pdf 2021-01-18T04:34:25.784Z f5d9dd41-c8e2-4fb8-9326-aeac41709374 INFO Error ::::::: Error: Command failed: gs -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pnggray -r600 -dDownScaleFactor=3 -sOutputFile=/tmp/segip UMBERTO.pdf-%03d.png /tmp/inputFile.pdf GPL Ghostscript 9.52: Unrecoverable error, exit code 1

2021-01-18T04:34:25.784Z f5d9dd41-c8e2-4fb8-9326-aeac41709374 INFO Finished!

rcastoro commented 3 years ago

It looks like the error is occurring in the operate function, more specifically the return new Promise((resolve, reject) => area, because of the lack of the OUTPUT FILE log.. Which means the Linux container that's spun up in your AWS lambda instance doesn't have permissions to or cannot store the temporary files required to process this request in the folder '/tmp/'. This is also echoed by your second response. Please put debugging in this area and return the cause, so I can fix the code, or submit a fix.

Do to my time constraints i have limited availability to look into this, but that seems to be your area of focus. Most likely change in times and linux distributions on AWS or a permissions issue.

rcastoro commented 2 years ago

Closed, cannot recreate.