rkusa / pdfjs

A Portable Document Format (PDF) generation library targeting both the server- and client-side.
MIT License
786 stars 142 forks source link

TypeError: Cannot read property '12' of undefined #305

Open bflemi3 opened 1 year ago

bflemi3 commented 1 year ago

I have a string that contains line breaks \n. After adding that string as text to a cell then calling asBuffer, I'm getting the following error.

TypeError: Cannot read property '12' of undefined LineBreaker.nextBreak (webpack-internal:///./node_modules/@rkusa/linebreak/src/linebreaker.js:93:39) at Text._render (webpack-internal:///./node_modules/pdfjs/lib/text.js:88:68)

The @rkusa/linebreaker repo is no longer around (or at least the github link referenced from npm is invalid), but the original repository, foliojs/linebreak, that it was based off looks like it has recent updates that aren't reflected in your version.


Here's my snippet of code that I'm trying to execute.

const table = pdfDoc.table({ widths: [140, null] })

for (const termGroup of sortedTermGroups) {
  const row = table.row({ paddingBottom: 20 })
  const titleCell = row.cell({ padding: 5 })
  titleCell.text(termGroup.termType.title, { font: HelveticaBold })

  const contentCell = row.cell({ padding: 5 })
  const contentTable = contentCell.table({ widths: [10, null] })
  for (const keyTerm of termGroup.keyTerms) {
    const contentRow = contentTable.row({ paddingBottom: 5 })
    const contentBullet = contentRow.cell()
    contentBullet.text('-')

    const contentContent = contentRow.cell()
    contentContent.text(keyTerm.content.replace(/(\r\n|\n|\r)/gm, ''))
  }
}

    return pdfDoc.asBuffer()

And, if it helps, here's the string from keyTerm.content that's causing issues.

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Nulla aliquet enim tortor at auctor urna nunc id. Condimentum mattis pellentesque id nibh tortor id aliquet lectus. Sagittis purus sit amet volutpat consequat mauris nunc. Posuere sollicitudin aliquam ultrices sagittis orci. Odio aenean sed adipiscing diam. Enim praesent elementum facilisis leo vel fringilla. Integer enim neque volutpat ac tincidunt vitae semper quis. Fames ac turpis egestas integer eget. Duis tristique sollicitudin nibh sit amet commodo nulla. Mattis aliquam faucibus purus in massa tempor.\n\nPulvinar pellentesque habitant morbi tristique. Turpis egestas sed tempus urna et. Feugiat in fermentum posuere urna nec tincidunt praesent semper. Ornare arcu dui vivamus arcu felis bibendum ut. Pellentesque habitant morbi tristique senectus et netus et malesuada fames. Netus et malesuada fames ac turpis egestas integer eget aliquet. Libero id faucibus nisl tincidunt eget. Tincidunt eget nullam non nisi est sit amet facilisis. Vulputate dignissim suspendisse in est. Tempor orci dapibus ultrices in iaculis nunc sed. Egestas sed sed risus pretium quam vulputate dignissim suspendisse in. Venenatis lectus magna fringilla urna porttitor rhoncus dolor purus non. Facilisis magna etiam tempor orci eu lobortis elementum nibh tellus. Risus commodo viverra maecenas accumsan lacus vel facilisis volutpat est. Semper eget duis at tellus at urna condimentum mattis. Volutpat blandit aliquam etiam erat velit.\n\nElementum curabitur vitae nunc sed velit dignissim sodales ut. Libero enim sed faucibus turpis in eu. Condimentum mattis pellentesque id nibh tortor id aliquet lectus proin. Nibh mauris cursus mattis molestie a iaculis at. Sapien pellentesque habitant morbi tristique. Semper feugiat nibh sed pulvinar proin gravida. Varius sit amet mattis vulputate. Aliquet enim tortor at auctor urna nunc id. Non quam lacus suspendisse faucibus interdum posuere. Feugiat pretium nibh ipsum consequat nisl vel. Velit egestas dui id ornare arcu odio ut sem. Eu scelerisque felis imperdiet proin. Vestibulum morbi blandit cursus risus at ultrices mi tempus imperdiet. Nibh nisl condimentum id venenatis a. In hendrerit gravida rutrum quisque non. Turpis cursus in hac habitasse platea dictumst quisque.\n\nTempor orci dapibus ultrices in. Et malesuada fames ac turpis egestas sed tempus urna et. Ut venenatis tellus in metus vulputate eu scelerisque felis imperdiet. At quis risus sed vulputate odio ut enim. Sit amet est placerat in egestas. Porta non pulvinar neque laoreet suspendisse interdum consectetur libero id. Duis convallis convallis tellus id interdum. Varius duis at consectetur lorem donec massa. Pharetra vel turpis nunc eget. Varius morbi enim nunc faucibus a pellentesque sit. Ipsum nunc aliquet bibendum enim facilisis gravida neque. Sit amet cursus sit amet dictum. Amet commodo nulla facilisi nullam vehicula ipsum. Cras sed felis eget velit aliquet sagittis id. In aliquam sem fringilla ut morbi. Condimentum lacinia quis vel eros donec ac odio tempor. Consectetur adipiscing elit pellentesque habitant morbi tristique. Eu tincidunt tortor aliquam nulla facilisi cras fermentum. Cursus metus aliquam eleifend mi in nulla. Tellus mauris a diam maecenas sed enim.\n\nMauris sit amet massa vitae tortor condimentum lacinia. Tempus urna et pharetra pharetra massa massa ultricies mi. Interdum consectetur libero id faucibus nisl tincidunt. Enim sit amet venenatis urna cursus eget. Facilisi nullam vehicula ipsum a arcu cursus vitae congue. Lacinia at quis risus sed vulputate. Id neque aliquam vestibulum morbi blandit cursus risus at ultrices. Purus gravida quis blandit turpis cursus in hac habitasse platea. Sodales ut etiam sit amet nisl purus. Sed vulputate mi sit amet mauris.
rkusa commented 1 year ago

Thanks for the report. The repo is here: https://github.com/rkusa/linebreak I unfortunately don't have the time to look into it right now, but will try to find some time eventually.

bflemi3 commented 1 year ago

After further investigation, it's failing on more than just line breaks. I commented out most of the code. Only creating the doc and adding a cell, then calling doc.asBuffer

const doc = new pdf.Document({
  padding: 40,
  font: require('pdfjs/font/Helvetica'),
})

doc.cell(result.title) // result.title = '2001-2003 - New Hampshire-New Hampshire Troopers Association-CBA'

doc.asBuffer()

Results in this error...

TypeError: Cannot read properties of undefined (reading '38') at LineBreaker.nextBreak (webpack-internal:///./node_modules/@rkusa/linebreak/src/linebreaker.js:99:39) at Text._render (webpack-internal:///./node_modules/pdfjs/lib/text.js:88:68) at eval (webpack-internal:///./node_modules/pdfjs/lib/text.js:400:35) at Document._next (webpack-internal:///./node_modules/pdfjs/lib/document.js:169:36) at eval (webpack-internal:///./node_modules/pdfjs/lib/document.js:172:19)

Hoping you'll have time to address this as our users are unable to download any of these generated PDFs and it's an important functionality of our platform.

rkusa commented 1 year ago

Does result.title in this case also include \n new lines? Because I cannot reproduce it with the following code:

test.js:

const fs = require("fs");
const pdf = require("pdfjs");

async function main() {
  try {
    const doc = new pdf.Document({
      padding: 40,
      font: require("pdfjs/font/Helvetica"),
    });
    doc.pipe(fs.createWriteStream("output.pdf"));

    doc.cell(
      "2001-2003 - New Hampshire-New Hampshire Troopers Association-CBA"
    );

    await doc.end();
  } catch (err) {
    console.log("Caught error:");
    console.error(err);
  }
}

main();

Snippet above also works for the string mentioned in the initial post:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Nulla aliquet enim tortor at auctor urna nunc id. Condimentum mattis pellentesque id nibh tortor id aliquet lectus. Sagittis purus sit amet volutpat consequat mauris nunc. Posuere sollicitudin aliquam ultrices sagittis orci. Odio aenean sed adipiscing diam. Enim praesent elementum facilisis leo vel fringilla. Integer enim neque volutpat ac tincidunt vitae semper quis. Fames ac turpis egestas integer eget. Duis tristique sollicitudin nibh sit amet commodo nulla. Mattis aliquam faucibus purus in massa tempor.\n\nPulvinar pellentesque habitant morbi tristique. Turpis egestas sed tempus urna et. Feugiat in fermentum posuere urna nec tincidunt praesent semper. Ornare arcu dui vivamus arcu felis bibendum ut. Pellentesque habitant morbi tristique senectus et netus et malesuada fames. Netus et malesuada fames ac turpis egestas integer eget aliquet. Libero id faucibus nisl tincidunt eget. Tincidunt eget nullam non nisi est sit amet facilisis. Vulputate dignissim suspendisse in est. Tempor orci dapibus ultrices in iaculis nunc sed. Egestas sed sed risus pretium quam vulputate dignissim suspendisse in. Venenatis lectus magna fringilla urna porttitor rhoncus dolor purus non. Facilisis magna etiam tempor orci eu lobortis elementum nibh tellus. Risus commodo viverra maecenas accumsan lacus vel facilisis volutpat est. Semper eget duis at tellus at urna condimentum mattis. Volutpat blandit aliquam etiam erat velit.\n\nElementum curabitur vitae nunc sed velit dignissim sodales ut. Libero enim sed faucibus turpis in eu. Condimentum mattis pellentesque id nibh tortor id aliquet lectus proin. Nibh mauris cursus mattis molestie a iaculis at. Sapien pellentesque habitant morbi tristique. Semper feugiat nibh sed pulvinar proin gravida. Varius sit amet mattis vulputate. Aliquet enim tortor at auctor urna nunc id. Non quam lacus suspendisse faucibus interdum posuere. Feugiat pretium nibh ipsum consequat nisl vel. Velit egestas dui id ornare arcu odio ut sem. Eu scelerisque felis imperdiet proin. Vestibulum morbi blandit cursus risus at ultrices mi tempus imperdiet. Nibh nisl condimentum id venenatis a. In hendrerit gravida rutrum quisque non. Turpis cursus in hac habitasse platea dictumst quisque.\n\nTempor orci dapibus ultrices in. Et malesuada fames ac turpis egestas sed tempus urna et. Ut venenatis tellus in metus vulputate eu scelerisque felis imperdiet. At quis risus sed vulputate odio ut enim. Sit amet est placerat in egestas. Porta non pulvinar neque laoreet suspendisse interdum consectetur libero id. Duis convallis convallis tellus id interdum. Varius duis at consectetur lorem donec massa. Pharetra vel turpis nunc eget. Varius morbi enim nunc faucibus a pellentesque sit. Ipsum nunc aliquet bibendum enim facilisis gravida neque. Sit amet cursus sit amet dictum. Amet commodo nulla facilisi nullam vehicula ipsum. Cras sed felis eget velit aliquet sagittis id. In aliquam sem fringilla ut morbi. Condimentum lacinia quis vel eros donec ac odio tempor. Consectetur adipiscing elit pellentesque habitant morbi tristique. Eu tincidunt tortor aliquam nulla facilisi cras fermentum. Cursus metus aliquam eleifend mi in nulla. Tellus mauris a diam maecenas sed enim.\n\nMauris sit amet massa vitae tortor condimentum lacinia. Tempus urna et pharetra pharetra massa massa ultricies mi. Interdum consectetur libero id faucibus nisl tincidunt. Enim sit amet venenatis urna cursus eget. Facilisi nullam vehicula ipsum a arcu cursus vitae congue. Lacinia at quis risus sed vulputate. Id neque aliquam vestibulum morbi blandit cursus risus at ultrices. Purus gravida quis blandit turpis cursus in hac habitasse platea. Sodales ut etiam sit amet nisl purus. Sed vulputate mi sit amet mauris.
bflemi3 commented 1 year ago

No it does not, and in fact, if I substitute result.title for '2001-2003 - New Hampshire-New Hampshire Troopers Association-CBA' it still fails.

Did some testing with lorem ipsum. See below code comments.

doc.cell('Lorem i') // Doesn't work - Throws "Cannot read properties of undefined (reading '39')"
doc.cell('Lorem ') // Works
return doc.asBuffer()

I'm using v2.4.7 and I'm on an M1 Mac if that makes any difference. Our api runs on arm64 architecture as well. Could that be the issue?

rkusa commented 1 year ago
doc.cell('Lorem i')

Works for me too on 2.4.7 and an M1. I also run pdfjs 2.4.7 in production and have never seen this error. That is quite an odd one.

Can you post the line in linebreaker.js that causes the error? Looks like the file got transformed by webpack so I cannot directly map :99:39 to the original file.

at LineBreaker.nextBreak (webpack-internal:///./node_modules/@rkusa/linebreak/src/linebreaker.js:99:39)
bflemi3 commented 1 year ago

That's correct, the switch on line 99...

// if not handled already, use the pair table
let shouldBreak = false
switch (pairTable[this.curClass][this.nextClass]) {

Package is @rkusa/linebreak version 1.0.0

rkusa commented 1 year ago

Doesn't look altered by any transformation. I am afraid that I am out of ideas for now. While I could adjust the pairTable[this.curClass][this.nextClass] line to be failsafe towards that error, but I think this is just a symptom and not the cause. I'd much rather find the cause, but for that I'd have to be able to reproduce it. Do you get the same error when trying to run this exact snippet via Node?

const fs = require("fs");
const pdf = require("pdfjs");

async function main() {
  try {
    const doc = new pdf.Document({
      padding: 40,
      font: require("pdfjs/font/Helvetica"),
    });
    doc.pipe(fs.createWriteStream("output.pdf"));

    doc.cell("Lorem i");

    await doc.end();
  } catch (err) {
    console.log("Caught error:");
    console.error(err);
  }
}

main();
bflemi3 commented 1 year ago

Yep, getting the same error running just that snippet. Referencing the line 99 in linebreak.js.

TypeError: Cannot read properties of undefined (reading '39')

When I can find the time I'll try to put a codesandbox together to reproduce. Been busy as of late.

rkusa commented 1 year ago

Here is a Stackblitz as a starting point if that helps: https://stackblitz.com/edit/js-vvyqad?file=index.js

bflemi3 commented 1 year ago

Works in stackblitz: https://js-vppx4v.stackblitz.io

getPdfBuffer fails on my machine and it's doing the same thing.

Could this have anything to do with it...

I'm having another issue with module resolution (probably caused by webpack) causing the error

'TypeError: opts.font must be set to a valid default font', 'at new Document (webpack-internal:///./node_modules/pdfjs/lib/document.js:59:13)', 'at Function.getTermSheetPDF (webpack-internal:///./src/DocUtils.ts:208:24)', 'at Function.getDocumentTermSheetPDF (webpack-internal:///./src/apis/DocumentAPI.ts:354:51)', 'at processTicksAndRejections (node:internal/process/task_queues:96:5)', 'at async Function.getDocumentTermSheetPDFFromID (webpack-internal:///./src/apis/DocumentAPI.ts:323:16)', 'at async APIResponse.processHandlerFunction (webpack-internal:///./node_modules/apilove/lib/APIResponse.ts:27:31)'

The error goes away if I change...

const pdfDoc = new pdf.Document({ font: require('pdfjs/font/Helvetica') })

to...

```javascript
const pdfDoc = new pdf.Document({ font: require('pdfjs/font/Helvetica.js') })

I'm currently using webpack@4.46.0

Could this be causing the issue I'm unable to repro in stackblitz?

rkusa commented 1 year ago

I think the bundling/transformation is most probably the cause, as it would explain why I cannot reproduce it (and why it hasn't been reported before).