k-yle / pdf-to-img

📃📸 Converts PDFs to images in nodejs
https://npm.im/pdf-to-img
MIT License
84 stars 22 forks source link

Notes on supporting node-canvas^3.0.0 #228

Open abrenoch opened 1 year ago

abrenoch commented 1 year ago

Hey there.

As noted in #225, there is an upcoming major release of node-canvas that appears to actually compile correctly with electron. I have been testing this with pdf-to-img, and it more or less works... But there is an issue I encountered I wanted to bring up because I think it will need to be addressed in some capacity or another. I'm not sure if this is a pdfjs or pdf-to-img problem (actually I'm about certain it is a pdfjs problem, but the canvas dependency resides in this repo).

First things first, my package.json looks something like this:

{
  ...
  "dependencies": {
    "pdf-to-img": "^2.1.2"
  },
  "overrides": {
    "pdf-to-img": {
      "canvas": "https://github.com/Automattic/node-canvas.git#master"
    }
  }
}

This appears to work as expected and installs the 3.0 version of canvas.

My test file looks like this:

const { fork } = require('node:child_process');
const fs = require('node:fs');
const { pdf } = require('pdf-to-img');

const file = `<some-file>`;

async function getPages () {
    const doc = await pdf(file);
    for await (const page of doc) {
        fs.writeFileSync("test.png", page);
    }
}

getPages();

When running the file, I'm presented with this error:

...\pdfjs-test\node_modules\pdfjs-dist\legacy\build\pdf.js:8743
    ctx.save = ctx.__originalSave;
             ^

TypeError: Cannot assign to read only property 'save' of object '#<CanvasRenderingContext2D>'
    at CanvasRenderingContext2D.ctx._removeMirroring (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:8743:14)
    at CanvasGraphics.endSMaskMode (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:9793:14)
    at CanvasGraphics.checkSMaskState (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:9770:12)
    at CanvasGraphics.restore (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:9841:12)
    at CanvasGraphics._restoreInitialState2 (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:10945:10)
    at CanvasGraphics.endDrawing (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:9572:79)
    at InternalRenderTask.cancel (...\pdfjs-test\node_modules\←[4mpdfjs-dist←[24m\legacy\build\pdf.js:6253:82)

Looking under the hood of pdfjs, the problem seems to arise from the mirrorContextOperations function, which is overwriting many of the built-in canvas context methods such as save, restore, scale, etc.

This doesn't seem to have been an issue for the previous version of canvas, but looking over some of the update it received to support NAPI, it seems like those methods are now defined as read-only properties.

Now I'm not sure if this is really an issue for this repo or for the pdfjs one. The problem certainly comes from their usage of the canvas context, but considering they don't even have a canvas dependency, I thought maybe this was the place to start. Maybe a sort of proxy object or something that would allow the methods overridden in mirrorContextOperations to be done so safely.

Just wanted to open a discussion!

k-yle commented 1 year ago

Hey, overriding ctx.save works in a browser so I think it's completely reasonable for pdfjs to do this.

It's a bit annoying that node-canvas now makes this property readonly. You're right that we could probably use a Proxy to bypass this limitation.

I personally can't compile the v3 pre-release of `node-canvas` but I'll revisit this once it's released or more stable ```c npm ERR! ../src/Image.cc: In member function ‘cairo_status_t Image::loadSVGFromBuffer(uint8_t*, unsigned int)’: npm ERR! ../src/Image.cc:1141:3: error: ‘rsvg_handle_get_intrinsic_size_in_pixels’ was not declared in this scope; did you mean ‘rsvg_handle_get_intrinsic_dimensions’? npm ERR! 1141 | rsvg_handle_get_intrinsic_size_in_pixels(_rsvg, &d_width, &d_height); npm ERR! | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ npm ERR! | rsvg_handle_get_intrinsic_dimensions ```
abrenoch commented 1 year ago

Thanks for the reply! Since this is such a new change, I'm going to bring it up over in the the node-canvas PR... Maybe the read only behavior is just an oversight.

GitMurf commented 1 year ago

Thanks for the reply! Since this is such a new change, I'm going to bring it up over in the the node-canvas PR... Maybe the read only behavior is just an oversight.

To follow-up here, it is confirmed by node-canvas team (https://github.com/Automattic/node-canvas/pull/2235#issuecomment-1759632954) that it was indeed just an oversight and they will fix when they have some time but also requested if anyone can for a PR they will approve. As @abrenoch mentioned in that thread, I as well do not have C++ experience so if someone else wants to take a look that would be wonderful!