mgmeyers / pdfannots2json

GNU Affero General Public License v3.0
42 stars 5 forks source link

Warnings for image extraction / no json created #1

Closed chrisgrieser closed 2 years ago

chrisgrieser commented 2 years ago

when trying to extract an image, I get this. also, the json does not get created. the one test image does get extracted though

./pdf-annots2json -o . -n image Stamm_2021_Groups\ Matter.pdf > bla.json
warning: undefined link destination
warning: ... repeated 196 times...
panic: interface conversion: core.PdfObject is nil, not *core.PdfObjectArray

goroutine 1 [running]:
main.pdfObjToHex({0x0?, 0x0?})
    /Users/matt/Documents/Personal/pdf-annots2json/color.go:21 +0x3f0
main.getColor(0x14000171b70?)
    /Users/matt/Documents/Personal/pdf-annots2json/color.go:39 +0x30c
main.handleImageAnnot(0xa, 0x140006a9180, {0x105045708?, 0x14000b02340?}, 0x1400084ec40)
    /Users/matt/Documents/Personal/pdf-annots2json/image.go:61 +0x38c
main.processAnnotations(0xa, 0x0?, {0x105045708, 0x14000b02340}, {0x140001f4600, 0x13, 0x5abbce8ee713b?}, 0x0)
    /Users/matt/Documents/Personal/pdf-annots2json/main.go:139 +0x4a8
main.main()
    /Users/matt/Documents/Personal/pdf-annots2json/main.go:100 +0x28c
mgmeyers commented 2 years ago

@chrisgrieser Do you still get these errors using the latest version? I've made a few updates that may have fixed this.

chrisgrieser commented 2 years ago

I don't get the error, but with the latest release, rectangle annotations (and therefore images) are simply not extracted at all.

this is what I ran in the command line, with the test.pdf

pdf-annots2json test.pdf -o .

[{"annotatedText":"., DeTienne, D. R., \u0026 Cardon, M. S. (2010). Reconceptualizing entrepreneurial exit: Divergent exit routes and their drivers. Journal of Business Venturing, 25(4), 361–375.","color":"#73fdff","date":"2022-04-08T14:07:57+02:00","type":"highlight","page":1}]

test.pdf

chrisgrieser commented 2 years ago

ah yes, saw the note in the latest release. works now, thanks 😅