modesty / pdf2json

converts binary PDF to JSON and text, for server-side PDF processing and command-line use.
https://github.com/modesty/pdf2json
Other
2.01k stars 377 forks source link

Units of the dimension/some are missing? #213

Open arthur798 opened 4 years ago

arthur798 commented 4 years ago

So a typical text line has:

{
  x: 4.453,
  y: 23.642,
  w: 1.06,
  sw: 0.32553125,
  clr: 0,
  A: 'left',
  R: [ { T: '%5B%20%5D%20', S: -1, TS: [Array] } ]
}

In what units are those values as I need it to be in pixels, also which values are the left (how far from left hand side), width and height values. Also I would need a top values so how far it's from the top of the page as I will need to insert a signature, how can I do that?

utsavsingh899 commented 3 years ago

The units used is page units. Page units are relative units which depend on the size, resolution and dpi of the system & pdf. Please refer to this link for more info (https://stackoverflow.com/questions/42494394/pdf2json-page-unit-what-is-it).

austenstrine commented 10 months ago

Page units are not the same kind of "relative" as the x/y values.

If I have a block of text at node1.x and another block of text at node2.x, and there is some space between node1 and node2, node1.x+node1.w is reliably greater than node2.x.

The documentation says that x and y are "relative units", and that w / h / l are "page units". For reasons unknown, 1 relative unit !== 1 page unit, or else node1.x+node1.w would be <= node2.x

If anyone can provide some information on reliably converting a "page unit" into a "relative unit", or vice versa, that would be extremely helpful.

The units used is page units. Page units are relative units which depend on the size, resolution and dpi of the system & pdf. Please refer to this link for more info (https://stackoverflow.com/questions/42494394/pdf2json-page-unit-what-is-it).

austenstrine commented 10 months ago

Page units are not the same kind of "relative" as the x/y values.

If I have a block of text at node1.x and another block of text at node2.x, and there is some space between node1 and node2, node1.x+node1.w is reliably greater than node2.x.

The documentation says that x and y are "relative units", and that w / h / l are "page units". For reasons unknown, 1 relative unit !== 1 page unit, or else node1.x+node1.w would be <= node2.x

If anyone can provide some information on reliably converting a "page unit" into a "relative unit", or vice versa, that would be extremely helpful.

The units used is page units. Page units are relative units which depend on the size, resolution and dpi of the system & pdf. Please refer to this link for more info (https://stackoverflow.com/questions/42494394/pdf2json-page-unit-what-is-it).

See https://github.com/modesty/pdf2json/issues/136#issuecomment-1129033826 for understanding/resolving this problem