modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.01k stars 377 forks source link

Unable to use this library in TypeScript #273

Closed syedshahabali93 closed 6 months ago

syedshahabali93 commented 2 years ago

On trying to import this library in TypeScript error "Could not find a declaration file for module 'pdf2json'. '/home/shahab/InboxHealth/pdfparser/node_modules/pdf2json/pdfparser.js' implicitly has an 'any' type. Try npm i --save-dev @types/pdf2json if it exists or add a new declaration (.d.ts) file containing declare module 'pdf2json';ts(7016)" is thrown.

import * as pdf2json from 'pdf2json'

I tried installing the @types for pdf2json but getting error that types for pdf2json are not present in the registry. Is this module officially for TypeScript like other modules?

Screenshot from 2022-06-29 12-31-17

modesty commented 2 years ago

try es6class branch, it has ESM support.

jonsamwell commented 2 years ago

@modesty using the es6class branch I am still running into the same issue

package.json

  "dependencies": {
    "@cucumber/cucumber": "^8.5.2",
    "@cucumber/html-formatter": "^20.0.0",
    "@cucumber/pretty-formatter": "^1.0.0",
    "date-fns": "^2.29.2",
    "expect": "^29.0.2",
    "form-data": "^4.0.0",
    "got": "^11.8.2",
    "pdf2json": "github:modesty/pdf2json#es6class",
    "playwright": "^1.25.1",
    "rxjs": "^7.5.6",
    "uuid": "^9.0.0"
  },

Typescript

import { Then } from '@cucumber/cucumber';
import * as pdf2json from 'pdf2json';

Then(
  'I expect the opened pdf to contain the text {string}',
  async function (this: TestWorld, expectedText: string) {
    debugger;
    // eslint-disable-next-line @typescript-eslint/no-explicit-any
    const pdfParser = new (pdf2json as any).PDFParser(this, 1);
  },
);

Build error:

 Could not find a declaration file for module 'pdf2json'. '/node_modules/pdf2json/pdfparser.js' implicitly has an 'any' type.
  Try `npm i --save-dev @types/pdf2json` if it exists or add a new declaration (.d.ts) file containing `declare module 'pdf2json';`

3 import * as pdf2json from 'pdf2json';
                            ~~~~~~~~~~

    at createTSError (\node_modules\ts-node\src\index.ts:859:12)
    at reportTSError (\node_modules\ts-node\src\index.ts:863:19)
    at getOutput (\node_modules\ts-node\src\index.ts:1077:36)
    at Object.compile (\node_modules\ts-node\src\index.ts:1433:41)
    at Module._compile (\node_modules\ts-node\src\index.ts:1617:30)
    at node:internal/modules/cjs/loader:1180:10
    at Object..ts (\node_modules\ts-node\src\index.ts:1621:12)
    at Module.load (node:internal/modules/cjs/loader:1004:32)
    at Function._load (node:internal/modules/cjs/loader:839:12)
    at Module.require (node:internal/modules/cjs/loader:1028:19)
vjau commented 2 years ago

pdf2json has no Typescript types. I wrote some, but for some reason my PR https://github.com/modesty/pdf2json/pull/278 is getting ignored by the maintainer.

rkumarkundu commented 1 year ago

Is it fixed now?. Because I am also not able to use this lib with Typescript.

tuffstuff9 commented 1 year ago

Something I came across when trying to do pdfParser.getRawTextContent():

I was getting a blank string.

In order to fix this, you have two options:

1) Fix the typescript definition by adding two parameters to the constructor, like so:

declare class Pdfparser extends EventEmitter{
    constructor(context: any, value: number);
    parseBuffer(buffer: Buffer): void;
    loadPDF(pdfFilePath: string, verbosity?: number):Promise<void>
    createParserStream():ParserStream
    on<K extends keyof EventMap>(eventName: K, listener: EventMap[K]): this
}

I found the above type definition by right clicking on PDFParser and clicking 'Go to Type Definition' in vscode. Since these types were not created by the actual maintainer, and from a PR from another dev, it is not expected that they would be fully correct.

2) Bypass type checking when declaring PDFParser, like so:

const pdfParser = new (PDFParser as any)(null, 1);

Then, I got getRawTextContent() to work like so:

pdfParser.on('pdfParser_dataReady', (pdfData: any) => {
        console.log((pdfParser as any).getRawTextContent());
      });
zach-betz-hln commented 9 months ago

Ran into similar issue as @tuffstuff9. Here's another workaround: https://github.com/modesty/pdf2json/issues/327

zach-betz-hln commented 7 months ago

See this comment for an improved workaround: https://github.com/modesty/pdf2json/issues/327#issuecomment-2007832054

modesty commented 6 months ago

try v3.1.2 please, it has type defs