Open dms-ts opened 1 year ago
I'm having thesame issue with @dms-ts. Please any ideas?
Same issue, however only for some types of PDFs. Regular PDF files uploaded from the user's device can be converted fine as they are, however for some reason this library fails to convert PDFs created with React PDF.
I'm having the same issue - any advice? I've installed Microsoft Fonts and have checked that Arial is installed on my EC2 Ubuntu system running node but still no luck.
I'm looking for a package that doesn't save to the file system and can import a PDF from URL and export an array of images. I'm very happy with this package with the exception of missing some text (obviously a big problem), but happy to switch an alternative if anyone has any advice?
I changed the verbosity of the PDF.js command to 1 so that I could get the following error messages, the once relating to Helvetica match the text that is missing. These are my error messages:
Warning: fetchStandardFontData: failed to fetch file "FoxitSans.pfb" with "UnknownErrorException: The standard font "baseUrl" parameter must be specified, ensure that the "standardFontDataUrl" API parameter is provided.".
Warning: fetchStandardFontData: failed to fetch file "FoxitSansBold.pfb" with "UnknownErrorException: The standard font "baseUrl" parameter
Warning: getPathGenerator - ignoring character: "Error: Requesting object that isn't resolved yet Helvetica_path_T.".
Warning: getPathGenerator - ignoring character: "Error: Requesting object that isn't resolved yet Helvetica_path_h.".
I think my system is saying that it would substitute the Helvetica with Arial:
fc-match Helvetica
Arial.ttf: "Arial" "Regular"
So not sure whats going on... I'll keep trying to find a solution and post back if I find something.
Think I found a fix that is legit:
I changed line 100 in the file pdf-img-convert.js:
var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: false, verbosity: 0});
It looks like this should be okay from the 2018 answer here.
So that didn't work, as mentioned in the earlier part of that 2018 thread that change will break other documents' fonts.
I'm able to resolve this issue using this instruction https://github.com/mozilla/pdf.js/issues/4244#issuecomment-1232548915
final version:
diff --git a/pdf-img-convert.js b/pdf-img-convert.js
index 01e8c64c9ffa13ea226a689fa08e78d97213dabe..97939693584b700a985fe3ef3a2fe054a26ddf41 100644
--- a/pdf-img-convert.js
+++ b/pdf-img-convert.js
@@ -29,6 +29,7 @@ const Canvas = require("canvas");
const assert = require("assert").strict;
const fs = require("fs");
const util = require('util');
+const path = require('path');
const readFile = util.promisify(fs.readFile);
@@ -95,9 +96,9 @@ module.exports.convert = async function (pdf, conversion_config = {}) {
// At this point, we want to convert the pdf data into a 2D array representing
// the images (indexed like array[page][pixel])
-
+ let packagePath = path.dirname(require.resolve("pdfjs-dist/package.json"));
var outputPages = [];
- var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0});
+ var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0, standardFontDataUrl: packagePath + '/standard_fonts/'});
var pdfDocument = await loadingTask.promise
@ol-th would you accept a PR for this?
I would also like to bump this issue, I will have to look for another library to use if this issue doesn't get solved Has anyone looked at @deathemperor's response? could it work?
Love the simplicity of using this library, just hope this issue can get resolved all the best
I'm able to resolve this issue using this instruction mozilla/pdf.js#4244 (comment)
final version:
diff --git a/pdf-img-convert.js b/pdf-img-convert.js index 01e8c64c9ffa13ea226a689fa08e78d97213dabe..97939693584b700a985fe3ef3a2fe054a26ddf41 100644 --- a/pdf-img-convert.js +++ b/pdf-img-convert.js @@ -29,6 +29,7 @@ const Canvas = require("canvas"); const assert = require("assert").strict; const fs = require("fs"); const util = require('util'); +const path = require('path'); const readFile = util.promisify(fs.readFile); @@ -95,9 +96,9 @@ module.exports.convert = async function (pdf, conversion_config = {}) { // At this point, we want to convert the pdf data into a 2D array representing // the images (indexed like array[page][pixel]) - + let packagePath = path.dirname(require.resolve("pdfjs-dist/package.json")); var outputPages = []; - var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0}); + var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace: true, verbosity: 0, standardFontDataUrl: packagePath + '/standard_fonts/'}); var pdfDocument = await loadingTask.promise
@ol-th would you accept a PR for this?
Hope you find it useful. That patch successfully converts our 300+ pdf daily
how can i implement your change @deathemperor? has it been patched into the latest version? or do you mean you made the change yourself in the lib files?
I can't edit the file directly, because i have a pipeline that does npm install
if i indeed have to implement that change myself i'll have to add a script to my pipeline to edit the file after the fact
i'd prefer not to do that, so If you have an alternative suggestion that would be great
thanks for your response though @deathemperor appreciate your time
@deathemperor if you could send a PR for this fix that would be great. I'll test it out and add it to a new release if all good.
how can i implement your change @deathemperor? has it been patched into the latest version? or do you mean you made the change yourself in the lib files?
I can't edit the file directly, because i have a pipeline that does npm install
if i indeed have to implement that change myself i'll have to add a script to my pipeline to edit the file after the fact
i'd prefer not to do that, so If you have an alternative suggestion that would be great
thanks for your response though @deathemperor appreciate your time
I use https://www.npmjs.com/package/patch-package to maintain patches like these until the repo officially supports.
@deathemperor if you could send a PR for this fix that would be great. I'll test it out and add it to a new release if all good.
sure, here's the PR https://github.com/ol-th/pdf-img-convert.js/pull/50
Hi guys, has this been merged into latest? I'd love to start using this, thanks
Hi @deathemperor, thank you so much for leading me to https://www.npmjs.com/package/patch-package
I managed to implement it successfully to continue using the library seemlessly.
much appreciated
Hi @deathemperor, thank you so much for leading me to https://www.npmjs.com/package/patch-package
I managed to implement it successfully to continue using the library seemlessly.
much appreciated
I'm glad it helped!
I'm trying to convert some shipping labels to png, it converts the barcodes and images, but no text/fonts. I already installed Font fix but it doesn't works.