opengovsg / pdf2md

A PDF to Markdown converter
https://www.npmjs.com/package/@opendocsg/pdf2md
MIT License
211 stars 40 forks source link

fix `parse()` arguments to match the documented types #77

Closed moshest closed 5 months ago

moshest commented 10 months ago

Problem

What problem are you trying to solve? What issue does this close?

The pdf2md say it will receive the following type:

@param {string|TypedArray|DocumentInitParameters|PDFDataRangeTransport} pdfBuffer
 * Passed to `pdfjs.getDocument()` to read a PDF document for conversion

However, it only works with BinaryData type.

Solution

How did you solve the problem?

I add a type check and make the first argument convert to docOptions depends on the type given:

if (typeof docOptions === "string" || docOptions instanceof URL) {
  docOptions = { url: docOptions };
} else if (docOptions instanceof ArrayBuffer || ArrayBuffer.isView(docOptions)) {
  docOptions = { data: docOptions };
}

This should match the docs of pdfjs: https://mozilla.github.io/pdf.js/api/draft/module-pdfjsLib.html

moshest commented 10 months ago

I don't think we need to be holier than the pope: https://github.com/mozilla/pdf.js/blob/12875359c387d7e2d312c50748833b3c52d986aa/src/display/api.js#L234


function getDocument(src) {
   // ...

   if (typeof src === "string" || src instanceof URL) {
      src = { url: src };
    } else if (isArrayBuffer(src)) {
      src = { data: src };
    }

    // ...
}