unjs / unpdf

📄 Utilities to work with PDFs in Node.js, browser and workers
MIT License
465 stars 12 forks source link

renderPageAsImage missing fonts #15

Open mwohlan opened 4 months ago

mwohlan commented 4 months ago

Describe the feature

Depending on the fonts used for PDFs, some fonts won't render at all when converted to an image.

To fix this issue we can disable the FontFace option and provide a standardFontDataUrl (through getDocument or getDocumentProxy options object) pointing to the standard_fonts folder.

Here is a working fix, since I might not be the only one facing this issue:

import { Buffer } from 'node:buffer'
import { dirname, resolve } from 'pathe'
import { configureUnPDF, getDocumentProxy,, renderPageAsImage } from 'unpdf'
import type { TypedArray } from 'pdfjs-dist/types/src/display/api'

export async function convertPdfToImg(buffer: ArrayBuffer | TypedArray, width) {
  try {
    await configureUnPDF({
      // Use the official PDF.js build
      pdfjs: () => import('pdfjs-dist'),
    })

    const packagePath = dirname(resolve('node_modules/pdfjs-dist/package.json'))

    const pdf = await getDocumentProxy(buffer, {
      isEvalSupported: false,
      useSystemFonts: false,
      disableFontFace: true,
      standardFontDataUrl: `${packagePath}/standard_fonts/`,

    })

    const pagenumber = 1

    const result = await renderPageAsImage(pdf, pageNumber, {
        canvas: () => import('canvas'),
        width,
   })

    return result
  }
  catch (error) {
    console.error('Error converting PDF to images:', error)
    throw new Error(`Failed to convert PDF to images: ${error.message}`)
  }
}

I don't know if pdfjs-dist has to be installed separately for this to work. Maybe @johannschopplich can consider integrating this fix.

Additional information

johannschopplich commented 4 months ago

Great idea! PR welcome. 🙌