modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2k stars 376 forks source link

bounding boxes #59

Open shaunc opened 8 years ago

shaunc commented 8 years ago

Could the documentation explain how to calculate bounding boxes for text items?

Text has x, y and w but no h. I presume that the font size could give you h, but they seem to be in other units. How should I convert?

BTW, what is the "TS" element? Can this help me?

modesty commented 8 years ago

TS is 'Text Style', line 332 in pdffont.js: let TS = [this.faceIdx, this.fontSize, this.bold?1:0, this.italic?1:0].

Since embedded font is not supported in parser, bounding box is determined by rendering platform. For example, in browser rendering, it's Element.getBoundingClientRect()

SPlatten commented 7 years ago

Is it possible to get the bounding rectangle without rendering the document? I am using the module to scan a PDF and extract text at specific locations which I would like to derive from they're bounding rectangle.