yakovmeister / pdf2image

A utility for converting pdf to image and base64 format.
MIT License
426 stars 140 forks source link

Attempt to convert but get an error "Couldn't initialise file." #208

Open apiwatCMD opened 3 months ago

apiwatCMD commented 3 months ago

So I am working on some PDF file that comes from Scanner. When I tried to convert it, the error occur

/Users/cmd/Documents/Learn/JS/node-typescript/node_modules/gm/lib/command.js:318
          err = new Error('Command failed: ' + stderr);
                ^
Error: Command failed:    **** Error: Couldn't initialise file.
               Output may be incorrect.

Requested FirstPage is greater than the number of pages in the file: 0
   No pages will be processed (FirstPage > LastPage).
gm convert: Postscript delegate failed (/var/folders/8z/d_ldprrj3116777hgkwkyp_w0000gn/T/gmLOn7Iw).

    at ChildProcess.onExit (/Users/cmd/Documents/Learn/JS/node-typescript/node_modules/gm/lib/command.js:318:17)
    at ChildProcess.emit (node:events:519:28)
    at ChildProcess.emit (node:domain:488:12)
    at maybeClose (node:internal/child_process:1105:16)
    at Socket.<anonymous> (node:internal/child_process:457:11)
    at Socket.emit (node:events:519:28)
    at Socket.emit (node:domain:488:12)
    at Pipe.<anonymous> (node:net:338:12) {
  code: 1,
  signal: null
}

Do you know how to fix this? It works pretty fine when it comes to typical PDF except one from Scanner. Or do I need to config my app more.

Here's my code below, Thanks.

import fs from "fs";
import { fromBuffer } from "pdf2pic";

const convertPDFToJpeg = async (file: Buffer, page: number) => {
  await fromBuffer(file, {
    quality: 100,
    format: "jpeg",
    saveFilename: "untitled",
    savePath: ".",
    density: 300,
    preserveAspectRatio: true,
  })(page, { responseType: "image" });
};

const handleTestReproduce = async () => {
  const file = fs.readFileSync("src/test-broken-file.pdf");
  await convertPDFToJpeg(file, 1);
};

handleTestReproduce();

Also, I am deeply apologize for that I couldn't give an example of PDF File because I cannot reproduce myself. Most of the case come from users upload their file and it should keep the content as sensitive info. All I can give you is an example of PDF data inside (which I am not an expertise in this PDF thing).

Here's an example of "test-broken-file.pdf"

%PDF-1.7
1 0 obj
<</Type /XObject /Subtype /Image /Name /Im1 /Width 1654 /Height 2338 /Length 356574/ColorSpace /DeviceRGB /BitsPerComponent 8 /Filter [ /DCTDecode ] >> stream
����JFIF����C     

 $.' ",#(7),01444'9=82<.342��C            

2!!22222222222222222222222222222222222222222222222222��    "v"�� 
���}!1AQa"q2���#B��R��$3br� 
%&'()*456789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz���������������������������������������������������������������������������   
���w!1AQaq"2�B����   #3R�br�
$4�%�&'()*56789:CDEFGHIJSTUVWXYZcdefghijstuvwxyz��������������������������������������������������������������������������?��(��
...

If you need more information. Feel free to ask me, I will give an necessary info as far as I can.

yakovmeister commented 2 months ago

from the looks of it, it seems that the pdf is corrupted or something. Have you checked if there's an option to modify the pdf output of the scanner?