opengovsg / pdf2md

A PDF to Markdown converter
https://www.npmjs.com/package/@opendocsg/pdf2md
MIT License
210 stars 40 forks source link

Pass PDF file as a Buffer #69

Closed panilya closed 10 months ago

panilya commented 1 year ago

Hello, thank you for your library, it's exactly what I need.

I don't know, maybe it's not bug, just me.

Describe the bug I try to parse PDF file from Buffer, but I get following error while doing so:

- error TypeError [ERR_INVALID_ARG_TYPE]: The "path" argument must be of type string. Received type number (7421)
    at new NodeError (node:internal/errors:399:5)
    at validateString (node:internal/validators:163:11)
    at Object.resolve (node:path:1098:7)
    at parse (/Users/panilya/Documents/botscrew/efforia/survey-tool/.next/server/chunks/765.js:2589:31)
    at module.exports (/Users/panilya/Documents/botscrew/efforia/survey-tool/.next/server/chunks/765.js:2159:26)
    at POST (/Users/panilya/Documents/botscrew/efforia/survey-tool/.next/server/app/api/upload/route.js:135:40)
    at async /Users/panilya/Documents/botscrew/efforia/survey-tool/.next/server/chunks/550.js:2823:37 {
  code: 'ERR_INVALID_ARG_TYPE'

This is my code:

import pdf2md from '@opendocsg/pdf2md'
import { NextRequest, NextResponse } from "next/server";

export async function POST(request: NextRequest) {
  const formData = await request.formData();

  const file = formData.get("file") as Blob | null;
  if (!file) {
    return NextResponse.json(
      { error: "File blob is required." },
      { status: 400 }
    );
  }

  const buffer = Buffer.from(await file.arrayBuffer());
  const data = await pdf2md(buffer, {})

  return NextResponse.json({ data: `${data}` }, { status: 200 });
}

I know it's possible to create temp file, but in my case it will be better to not create any additional files.

Thank you!

P.S: passing path as a string also doesn't work (temp files)

wilsonhou commented 1 year ago

Getting the same error. +1!

cdelacombaz commented 1 year ago

I have the same issue. I am working on a next.js project and try to use this library in a route handler.

It works perfectly fine in development mode. As soon as I build the project, I get the same error.

I could find out where the error gets triggered, but don't understand why.

This is line triggers it:

const fontDataPath = path.join( path.resolve(require.resolve('pdfjs-dist'), '../../standard_fonts'), '/')

https://github.com/opengovsg/pdf2md/blob/master/lib/util/pdf.js#L19

moshest commented 10 months ago

I managed to solve this by updated the next.config.mjs file by adding:

/** @type {import('next').NextConfig} */
const config = {
  experimental: {
    serverComponentsExternalPackages: ['@opendocsg/pdf2md', 'pdfjs-dist'],
  }
  // ...
};

export default config;
LoneRifle commented 10 months ago

This isn't due to support for ArrayBuffers (it's there, and as https://github.com/opengovsg/pdf2md/issues/69#issuecomment-1787541630 demonstrates, working as intended), but due to Webpack mangling symbols when bundling.

This hence looks like a duplicate of #76, which better describes the issue.

Duplicate of #76