Closed chrisgrieser closed 2 years ago
Hmm, so it looks like this PDF is malformed some how. I see a ton of parsing errors in the output, and some how the Y axis is getting flipped for the annotations. In most PDFs 0 is the bottom of the PDF, but for the highlights in this PDF, 0 seems to be oriented to the top. Not sure if there's much I can do here.
Never mind, I managed to track down the bug.
😂 thanks!
@chrisgrieser The new version is on homebrew, try running brew upgrade
and see if it fixes the issues for you
yep, works now! Thanks a lot 🥳
Not sure whether this is a bug or a feature request, bug could there be an option to force treating the PDF as two-column extraction? Afair, one of the nifty things about pdfannots2json was that it recognizes this automatically, but I a two-column PDF which is treated as a one-column PDF, meaning the order of citations is all jumbled up.
Here a two page PDF sample, with the (beautified) JSON output I get. The annotations are ordered by their y-position, rather than doing one column and then the next sample.zip