modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.02k stars 377 forks source link

Property 'getRawTextContent' does not exist on type 'Pdfparser'.ts(2339) #327

Closed zach-betz-hln closed 6 months ago

zach-betz-hln commented 9 months ago

Thanks for this sweet library.

When using it in a TypeScript project and calling pdfParser.getRawTextContent(), I got this error:

Property 'getRawTextContent' does not exist on type 'Pdfparser'.ts(2339)

In the meantime, I was able to work around it by extending the class:

import PDFParser from 'pdf2json';

interface PatchedPDFParser extends PDFParser {
  getRawTextContent: () => string;
}

const pdfParser = new PDFParser(undefined, 1) as PatchedPDFParser;

pdfParser.on('pdfParser_dataError', (errorMessage) => {
  console.error(errorMessage);
});

pdfParser.on('pdfParser_dataReady', (_output) => {
  const text = pdfParser.getRawTextContent();
  console.log(text);
});

pdfParser.loadPDF('/path/to/file.pdf');
modesty commented 9 months ago

please test with 2b47fcd

zach-betz-hln commented 9 months ago

Hi @modesty - how do I install the package at this particular commit?

haahmad commented 9 months ago

seems like a new version with the fix needs to be published to npm

zach-betz-hln commented 8 months ago

I've worked around this by creating file src/@types/pdf2json/index.d.ts with the below contents:

Contents ```ts // Copied and adapted from https://github.com/modesty/pdf2json/blob/master/pdfparser.d.ts declare module 'pdf2json' { declare class PDFParser extends EventEmitter { constructor(context?: unknown, needRawText?: number, password?: string); parseBuffer(buffer: Buffer, verbosity?: number): void; loadPDF(pdfFilePath: string, verbosity?: number): Promise; createParserStream(): ParserStream; getRawTextContent(): string; on(eventName: K, listener: EventMap[K]): this; } type EventMap = { pdfParser_dataError: (errMsg: Record<'parserError', Error>) => void; pdfParser_dataReady: (pdfData: Output) => void; readable: (meta: Output['Meta']) => void; data: (data: Output['Pages'][number] | null) => void; }; declare class ParserStream { //TODO } interface Output { Transcoder: string; Meta: Record; Pages: Page[]; } interface Page { Width: number; Height: number; HLines: Line[]; VLines: Line[]; Fills: Fill[]; Texts: Text[]; Fields: Field[]; Boxsets: Boxset[]; } interface Fill { x: number; y: number; w: number; h: number; oc?: string; clr?: number; } interface Line { x: number; y: number; w: number; l: number; oc?: string; clr?: number; } interface Text { x: number; y: number; w: number; sw: number; A: 'left' | 'center' | 'right'; R: TextRun[]; oc?: string; clr?: number; } interface TextRun { T: string; S: number; TS: [number, number, 0 | 1, 0 | 1]; RA?: number; } interface Boxset { boxes: Box[]; id: { Id: string; EN?: number; }; } interface Field { id: { Id: string; EN?: number; }; style: number; TI: number; AM: number; TU: string; x: number; y: number; w: number; h: number; T: { Name: 'alpha' | 'link'; TypeInfo: object; }; } interface Box { x: number; y: number; w: number; h: number; oc?: string; clr?: number; } interface Box { id: { Id: string; EN?: number; }; T: { Name: string; TypeInfo?: object; }; x: number; y: number; w: number; h: number; TI: number; AM: number; checked?: boolean; style: number; } export default PDFParser; } ```
StanislavKharchenko commented 7 months ago

Hello. When the fix will be published?

modesty commented 6 months ago

published in 3.1.2