rkusa / pdfjs

A Portable Document Format (PDF) generation library targeting both the server- and client-side.
MIT License
771 stars 143 forks source link

Cannot catch error in asBuffer() #312

Open 7freaks-otte opened 1 year ago

7freaks-otte commented 1 year ago

I'm currently merging around 1500 PDFs and tried to find the defect one, but I cannot catch errors produced by this.end() in asBuffer().

While errors get caught here:

try {
    doc.pipe(fs.createWriteStream(fullPdfPath));
    await doc.end();
} catch (err) { console.error(err); }

The node process quits with an unhandled error here:

try {
    const buf = await doc.asBuffer();
    fs.writeFileSync(fullPdfPath, buf, { encoding: 'binary' });
} catch (err) { console.error(err); }

I'm pretty sure the reason for the uncaught error is this line: https://github.com/rkusa/pdfjs/blob/3374d1ff1142d16e47a10dac2ba93a3f0f161a35/lib/document.js#L636

It should probably be:

if (shouldEnd) {
    this.end().catch(reject)
}

Interesting side fact: PDFs throwing errors like Invalid xref object at 54524 or Name must start with a leading slash, found: 0 are single-page PDFs previously extracted by pdfjs from other multi-page PDFs. Extracting worked, but merging again failed.

I could get rid of the Invalid xref object error by extracting with asBuffer() writeFileSync and encoding binary instead of pipe and stream but the one PDF with Name must start with a leading slash, found: 0 drives me crazy.

rkusa commented 1 year ago

I'm currently merging around 1500 PDFs

This is something pdfjs can handle, besides the described issue with a defective one in between?

I'm pretty sure the reason for the uncaught error is this line:

https://github.com/rkusa/pdfjs/blob/3374d1ff1142d16e47a10dac2ba93a3f0f161a35/lib/document.js#L636

It should probably be:

if (shouldEnd) {
    this.end().catch(reject)
}

I think you are right.

Interesting side fact: PDFs throwing errors like Invalid xref object at 54524 or Name must start with a leading slash, found: 0 are single-page PDFs previously extracted by pdfjs from other multi-page PDFs. Extracting worked, but merging again failed.

I could get rid of the Invalid xref object error by extracting with asBuffer() writeFileSync and encoding binary instead of pipe and stream but the one PDF with Name must start with a leading slash, found: 0 drives me crazy.

Are you able to provide a small example to repo either or both errors?

7freaks-otte commented 1 year ago

This is something pdfjs can handle, besides the described issue with a defective one in between?

Yes, and it's blazingly fast ;-)

First I had to extract the 1500 single pages from around 25 different multi page PDFs and also needed them as JPG files. This took a little time, mostly because of the image extraction:

const jpgScale = 5;
for (const filename of fs.readdirSync(multiPagesPdfDirectory)) {
    if (!filename.endsWith('.pdf')) continue;
    const prefix = path.basename(filename, '.pdf');
    const filepath = path.join(multiPagesPdfDirectory, filename);
    const src = new pdfjs.ExternalDocument(fs.readFileSync(filepath));
    for (let num = 1; num <= src.pageCount; num ++) {
        const pdfFilepath = path.join(singlePagesPdfDirectory, `${prefix}-${String(num).padStart(4, '0')}.pdf`);
        const jpgFilepath = pdfFilepath + '.jpg';
        const doc = new pdfjs.Document();
        doc.addPageOf(num, src);
        //This created some invalid PDFs (Error: Invalid xref object at 54524 - only noticed when merging again in the next code block)
        // doc.pipe(fs.createWriteStream(pdfFilepath));
        // await doc.end();
        //This created mostly valid PDFs (except: Name must start with a leading slash, found: 0 - only noticed when merging again in the next code block)
        await doc.asBuffer().then(data => fs.writeFileSync(pdfFilepath, data, { encoding: 'binary' }));
        const image = (await convert(pdfFilepath, { scale: jpgScale }))[0]; //pdf-img-convert
        const jpg = sharp(image, { failOn: 'none' })
            .flatten({ background: '#ffffff' })
            .toColourspace('srgb')
            .jpeg({ quality: 85, progressive: true });
        await jpg.toFile(jpgFilepath);
    }
}

Then I had to merge all of them into a single PDF file. This took under 1 second, only issue could have been memory (especially when automatically retrying and not destroying a failed writeStream):

const doc = new pdfjs.Document();
for (const filename of fs.readdirSync(singlePagesPdfDirectory)) {
    if (!filename.endsWith('.pdf')) continue;
    const filepath = path.join(singlePagesPdfDirectory, filename);
    try {
        const src = fs.readFileSync(filepath);
        const ext = new pdfjs.ExternalDocument(src);
        doc.addPagesOf(ext);
    } catch(err) {
        const { data, info } = await sharp(filepath + '.jpg, { failOn: 'none' }).toBuffer({ resolveWithObject: true });
        const width = info.width / jpgScale;
        const height = info.height / jpgScale;
        const pdf = pdfmake.createPdfKitDocument({
            pageSize: { width, height },
            pageOrientation: 'portrait',
            pageMargins: [0, 0, 0, 0],
            content: [{
                image: data,
                left: 0,
                top: 0,
                width: width,
                height: height,
            }],
        });
        const buf = await new Promise((resolve, reject) => {
            const chunks = [];
            pdf.on('data', chunk => chunks.push(chunk));
            pdf.on('end', () => resolve(Buffer.concat(chunks)));
            pdf.on('error', reject);
            pdf.end();
        });
        const ext = new pdfjs.ExternalDocument(buf);
        doc.addPagesOf(ext);
    }
}
doc.pipe(fs.createWriteStream(fullPdfPath));
await doc.end();

As I was in a rush, I added a quick fix falling back to the extracted image. But this only helped with "Invalid xref object at 54524" errors as they occurred while reading the single page PDFs. The "Name must start with a leading slash, found: 0" error occurred while writing the fully merged PDF, this is where I could not catch the error to find out which page.

Also trying to repair the affected PDFs (after narrowing down which single page was actually to blame) did not help.

Lots of unnecessary code but I thought you might be interested in how I used your library.

Are you able to provide a small example to repo either or both errors?

Unfortunately the repo is private, but I'll send you example PDFs via email, that you can use along my code above, as soon as I have time to find the relevant files.

wildhart commented 1 year ago

I'm having the exact same error with a few different PDF files which I can send you privately. I'm using latest pdfjs 2.5.0

My real code downloads two file buffers from external pdf, combines them, and then throws the unhandled error while converting the combined document to a buffer.

Here's my minimal repro code:

import * as pdfjs from 'pdfjs';
import * as https from 'https';

(async () => {
    try {
        console.log('downloading...');
        const pdfBuffer = await downloadExternalReport('contact me for URLs');
        const doc = new pdfjs.ExternalDocument(pdfBuffer);
        const outputDoc = new pdfjs.Document();
        outputDoc.addPagesOf(doc);

        console.log('converting to buffer...');
        const outBuffer = await outputDoc.asBuffer(); // <- error thrown here but not caught

        console.log('done!');
    } catch (e) {
        console.error('error combining PDFs', e);
    }
})();

function downloadExternalReport(url: string) {
    const data: Buffer[] = [];
    return new Promise<Buffer>((resolve, reject) => {
        const request = https.get(url, (response) => {
            if (response.statusCode !== 200) {
                reject('Error downloading external report');
            } else {
                response.on('data', (d: Buffer) => data.push(d));
                response.on('end', () => resolve(Buffer.concat(data)));
            }

        });
        request.on('error', reject)
    })
}

One file gives me this error Invalid value:

2023-07-20 11:00:29 info: downloading...
2023-07-20 11:00:30 info: converting to buffer...
2023-07-20 11:00:30 error: (node:5764) UnhandledPromiseRejectionWarning: Error: Invalid value
    at Lexer._error (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\lexer.js:152:11)
    at Object.exports.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\value.js:26:9)
    at Function.parseInner (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:80:28)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:68:27)
    at parseObject (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:128:22)
    at PDFReference.get [as object] (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:15:17)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:68:35)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:84:18)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:75:16)
    at ExternalDocument.write (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\external.js:63:14)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at Document.end (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\document.js:544:5)
(Use `node --trace-warnings ...` to show where the warning was created)
2023-07-20 11:00:30 error: (node:5764) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
2023-07-20 11:00:30 error: (node:5764) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Another gives me this error Name must start with a leading slash, found: 0:

2023-07-20 11:03:58 info: converting to buffer...
2023-07-20 11:03:58 error: (node:34740) UnhandledPromiseRejectionWarning: Error: Name must start with a leading slash, found: 0
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\name.js:67:13)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\dictionary.js:71:27)
    at Object.exports.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\value.js:20:30)
    at Function.parseInner (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:80:28)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:68:27)
    at parseObject (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:128:22)
    at PDFReference.get [as object] (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:15:17)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:68:35)
    at C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:89:18
    at Array.forEach (<anonymous>)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:88:15)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:76:16)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:72:16)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:84:18)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:75:16)
    at ExternalDocument.write (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\external.js:63:14)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at Document.end (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\document.js:544:5)
2023-07-20 11:03:58 error: (node:34740) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
2023-07-20 11:03:58 error: (node:34740) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

And another gives me this error Name must start with a leading slash, found: (:

2023-07-20 11:06:21 info: converting to buffer...
2023-07-20 11:06:21 error: (node:16452) UnhandledPromiseRejectionWarning: Error: Name must start with a leading slash, found: (
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\name.js:67:13)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\dictionary.js:71:27)
    at Object.exports.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\value.js:20:30)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\dictionary.js:74:30)
    at Object.exports.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\value.js:20:30)
    at Function.parseInner (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:80:28)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\object.js:68:27)
    at parseObject (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:128:22)
    at PDFReference.get [as object] (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\reference.js:15:17)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:68:35)
    at C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:89:18
    at Array.forEach (<anonymous>)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:88:15)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:84:18)
    at Function.addObjectsRecursive (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\parser\parser.js:75:16)
    at ExternalDocument.write (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\external.js:63:14)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at Document.end (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\document.js:544:5)
2023-07-20 11:06:21 error: (node:16452) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 1)
2023-07-20 11:06:21 error: (node:16452) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
wildhart commented 1 year ago

Interesting side fact: PDFs throwing errors like Invalid xref object at 54524 or Name must start with a leading slash, found: 0 are single-page PDFs previously extracted by pdfjs from other multi-page PDFs. Extracting worked, but merging again failed.

In my case, all three of my files were previously generated by pdfjs as well. My first process is to generate Report A from html using wkhtmltopdf, then I append a few 3rd party PDFs (invoices, etc), to that report using pdfjs and save that, then upload to S3 - all good. My next step is to generate Report B and upload that to S3. Then, later on I download both files from S3 and append Report B to the end of Report A so I can upload the combined reports to a 3rd party API.

This error gets triggered when exporting the combined report to a buffer, even though both files were previously exported to buffer by pdfjs. So maybe something is caused by a file being exported twice. However, I guess that is separate issue to this one of the error not being handle correctly - maybe #217 #166

rkusa commented 1 year ago

The unhandled promise rejection error should be fixed on main. I've also added a test for adding a PDF generated by pdfjs, and that worked fine. So it generally seems to work (as in pdfjs generates PDF it itself deems valid), except when it does not. I don't know exactly yet what causes it to not work in both of your cases. Kinda doesn't make sense that adding a PDF works the first time, but doesn't when adding that generated PDF again later on.

I am afraid though that this issue isn't very high on my list, since it does not affect my own use-case. So that you can plan, you should know that I don't expect to work on that in the foreseeable future.

wildhart commented 1 year ago

Thanks @rkusa. Do you have an ETA on when you can publish the unhandled promise rejection fix to npm?

rkusa commented 1 year ago

@wildhart just released as 2.5.1

wildhart commented 1 year ago

I've tried 2.5.1 and I'm afraid I still get the same unhandled errors as before.

If I edit your code directly in my node_modules folder, and make the change to document.js as suggested by @7freaks-otte, (combined with your new return statement):

if (shouldEnd) {
    return this.end().catch(reject)
}

Then the error is properly caught and handled by my own error handler:

                        ...
                        console.log('converting to buffer...');
                        const outBuffer = await outputDoc.asBuffer();
                        console.log('done!');
                    } catch (e) {
                        console.error('error combining PDFs', e);
                    }
2023-07-24 11:15:17 info: converting to buffer...
2023-07-24 11:15:17 error: error combining PDFs Error: Name must start with a leading slash, found: 0
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\name.js:70:13)
    at Function.parse (C:\Users\cmori\projects\Sourcecode\ASCP Web App\backend\node_modules\pdfjs\lib\object\dictionary.js:72:27)
rkusa commented 1 year ago

Well, ... 🤦‍♂️ – apparently returning a promise to chain it was only a thing inside of a then()/catch(). I gave it some more time and added a unit test to be sure this time. Thanks for the feedback. Released as 2.5.2.

7freaks-otte commented 11 months ago

@rkusa thanks for the toBuffer error fix.

As for the Name must start with a leading slash error, I understand your priorities.

I identified the single page PDF causing this error and I'll send it to you via mail. You can just use it with your newly written test as addallpages.pdf to reproduce the error. Maybe you can easily identify the problem once you find a little time.

wildhart commented 11 months ago

@rkusa if it helps, do you accept Github sponsorship?

My client uses this in a commercial environment and this bug is costing them time (when we come across this error the only solution is to "reprint" the offending PDF, then we are able to append it to our own PDFs). I was going to see if I could investigate it myself if I could find any time.

So if you could fix this, you'd save us time and therefore $$, so would be happy to send something your way...

7freaks-otte commented 11 months ago

@wildhart If it helps, I can send you our failing single PDF (60-100KB) as well.

I was able to repair the PDF via iLovePDF and was then able to use addPagesOf without error.

I just found out, that they have a NodeJS library (https://github.com/ilovepdf/ilovepdf-nodejs) to access their API. Feels like a really ugly workaround but I was thinking of implementing this on failing PDFs.

rkusa commented 11 months ago

@wildhart I appreciate the offer, but pdfjs is too low on my list of priorities to accept $ with a good conscious.

Anyway, the documents @7freaks-otte send over made it very easy for me to spot the issue. Thanks a lot for narrowing it down to a single page @7freaks-otte!

I've just pushed a fix. However, adding pages of pdfjs generated PDFs that are already broken isn't fixed. Just newly generated PDFs with pdfjs should work now when being added again.

Mind checking main and confirming that it is fixed before I publish a new version?

wildhart commented 11 months ago

I've tried installing your latest pdfjs direct from github, but I continue to get "Name must start with a leading slash, found: (" with some files.

Also, with another file I get your new error "Tried to write reference with null object id". What does this mean, and how can we avoid it?

I've sent you two files by email...

7freaks-otte commented 11 months ago

@rkusa Thank you very much, I'll try to test your fix the next days and give you feedback.

rkusa commented 11 months ago

@wildhart Thanks for testing. To be sure, my fix prevents that pdfjs generates invalid PDFs (at least one instance). If you already have an invalid PDF, and try to add it to a new document, you'll still see the error. So maybe you are trying to add PDFs previously generated with pdfjs that are already broken?

The error Tried to write reference with null object id is a new addition to prevent generating such invalid PDFs in the first place. You might have encountered another instance where pdfjs would generate a broken PDF. Thanks for sending it over, I'll look at it.

wildhart commented 11 months ago

So maybe you are trying to add PDFs previously generated with pdfjs that are already broken?

In the example I sent you "Tax-Invoice-M590936.pdf" that file was not generated by pdfjs (at least not by me) - that file was uploaded by one of our clients and triggers the "Name must start with a leading slash, found: (" error when appended to a pdf using pdfjs, then that pdf is appended to another pdf.

7freaks-otte commented 9 months ago

@rkusa sorry for the delay, I was quite busy the last weeks.

Your commit https://github.com/rkusa/pdfjs/commit/b6cdd70c64611d0e1369ad928028b2cf51009379 seems to fix the Name must start with a leading slash, found: 0 error but same as @wildhart I now encounter the new TypeError: Tried to write reference with 'null' object id on the same page.

Maybe its worth noting that I just want to add a single page (3) from the previously generated PDF.

What worked for me (though not practical) is:

wildhart commented 8 months ago

Just FYI, I've moved way from using pdfjs for merging PDFs, due to this issue with certain PDFS causing errors, and also excessive file sizes (#314).

Instead I'm using pdf-lib which is really easy to use to copy pages from one PDF to another, and it doesn't have any problems with the files we've provided here which throw errors in pdfjs, and the output file size is never bigger than the original files. It also seems a bit faster.

I'm still using pdfjs to generate PDF from html, but then I use pdf-lib to combine that with other PDF files.

rkusa commented 7 months ago

In the example I sent you "Tax-Invoice-M590936.pdf" that file was not generated by pdfjs (at least not by me) - that file was uploaded by one of our clients and triggers the "Name must start with a leading slash, found: (" error when appended to a pdf using pdfjs, then that pdf is appended to another pdf.

File works for me with the previous fix – not sure if it is a specific constellation on how it is added to the file.

Your commit b6cdd70 seems to fix the Name must start with a leading slash, found: 0 error but same as @wildhart I now encounter the new TypeError: Tried to write reference with 'null' object id on the same page.

This error was added as part of the fix to prevent pdfjs to generate invalid PDFs in similar situations – and you seem to have found another one. However, I don't think that I'll find the time to look into that – sorry.

Just FYI, I've moved way from using pdfjs for merging PDFs, due to this issue with certain PDFS causing errors, and also excessive file sizes (#314).

Instead I'm using pdf-lib which is really easy to use to copy pages from one PDF to another, and it doesn't have any problems with the files we've provided here which throw errors in pdfjs, and the output file size is never bigger than the original files. It also seems a bit faster.

Sounds like a good decision to me. I've also added a note about the current maintenance status to the README. I myself moved most of my uses of pdfjs to a simple HTML to PDF via headless Chrome (I don't have the use-case of adding other PDFs anymore).

7freaks-otte commented 7 months ago

For the moment I'm OK with my workaround above using 2 pdfjs versions at a time, as the PDFs are only genertated once in a several months. I understand your priorities. Thanks for your help anyway @rkusa