Closed jarvist closed 4 years ago
Trying to extract text from a simple Google Docs PDF,
julia> pdPageExtractText(stdout, pdDocGetPage(pdDocOpen("Downloads/GoogleDocs.pdf"), 1))
fails with:
ERROR: MethodError: no method matching setindex!(::Rectangle.RBTree{Rectangle.IntervalKey{UInt16},Int64}, ::Float32, ::Rectangle.Interval{UInt16}) Closest candidates are: setindex!(::Rectangle.RBTree{Rectangle.IntervalKey{K},V}, ::V, ::Rectangle.Interval{K}) where {K, V} at /home/jarvist/.julia/packages/Rectangle/SnGUM/src/interval.jl:117 Stacktrace: [1] get_cid_font_widths(::PDFIO.Cos.CosDocImpl, ::PDFIO.Cos.CosIndirectObject{CosDict}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDFontMetrics.jl:204 [2] get_font_widths(::PDFIO.Cos.CosDocImpl, ::PDFIO.Cos.CosIndirectObject{CosDict}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDFontMetrics.jl:164 [3] PDFont at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDFonts.jl:391 [inlined] [4] get_pd_font!(::PDFIO.PD.PDDocImpl, ::PDFIO.Cos.CosIndirectObject{CosDict}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDDocImpl.jl:112 [5] get_font(::PDFIO.PD.PDPageImpl, ::CosName) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPage.jl:320 [6] evalContent!(::PDPageElement{:Tf}, ::PDFIO.PD.GState{:PDFIO}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPageElement.jl:735 [7] evalContent! at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPageElement.jl:637 [inlined] [8] evalContent!(::PDPageTextObject, ::PDFIO.PD.GState{:PDFIO}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPageElement.jl:680 [9] evalContent! at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPageElement.jl:637 [inlined] [10] pdPageEvalContent(::PDFIO.PD.PDPageImpl, ::PDFIO.PD.GState{:PDFIO}) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPage.jl:146 [11] pdPageEvalContent at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPage.jl:145 [inlined] [12] pdPageExtractText(::Base.TTY, ::PDFIO.PD.PDPageImpl) at /home/jarvist/.julia/packages/PDFIO/Miu63/src/PDPage.jl:179 [13] top-level scope at REPL[30]:1
Fixed with commit f98d449c0633242f261a7241cae714d837452b05.
Test files added as per https://github.com/sambitdash/PDFTest/releases/tag/v0.0.7.
Trying to extract text from a simple Google Docs PDF,
fails with: