sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

Bug: error reading attached PDF, works with other PDFs. #89

Closed jakewilliami closed 4 years ago

jakewilliami commented 4 years ago

Stacktrace:

ERROR: LoadError: Found ' (32)' Expected '<' here
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] skipv at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/BufferParser.jl:25 [inlined]
 [3] read_trailer(::IOStream, ::Int64) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:382
 [4] read_xref_tables(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:458
 [5] doc_trailer_update(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:409
 [6] cosDocOpen(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:139
 [7] PDFIO.PD.PDDocImpl(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDocImpl.jl:16
 [8] pdDocOpen(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDoc.jl:77
 [9] pdDocOpen at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDoc.jl:77 [inlined]
 [10] getPDFText at /Users/jakeireland/scripts/pdfsearches/pdfsearch.jl:18 [inlined]
 [11] scanFiles(::String, ::String) at /Users/jakeireland/scripts/pdfsearches/pdfsearch.jl:68
 [12] top-level scope at /Users/jakeireland/scripts/pdfsearches/pdfsearch.jl:92
 [13] include(::Module, ::String) at ./Base.jl:377
 [14] exec_options(::Base.JLOptions) at ./client.jl:288
 [15] _start() at ./client.jl:484
in expression starting at /Users/jakeireland/scripts/pdfsearches/pdfsearch.jl:92

Using this script.

file.pdf

sambitdash commented 4 years ago

I do not see any issues with the specific file. Please send the specific file where you are seeing the issues. file.txt

jakewilliami commented 4 years ago

That is the specific file I am using. I am using OS X 10.14.

jakewilliami commented 4 years ago
julia> function getPDFText(src, out)
           # handle that can be used for subsequence operations on the document.
           doc = pdDocOpen(src)

           # Metadata extracted from the PDF document.
           # This value is retained and returned as the return from the function.
           docinfo = pdDocGetInfo(doc)
           open(out, "w") do io

               # Returns number of pages in the document
               npage = pdDocGetPageCount(doc)

               for i=1:npage

                   # handle to the specific page given the number index.
                   page = pdDocGetPage(doc, i)

                   # Extract text from the page and write it to the output file.
                   pdPageExtractText(io, page)

               end
           end
           # Close the document handle.
           # The doc handle should not be used after this call
           pdDocClose(doc)
           return docinfo
       end
getPDFText (generic function with 1 method)

julia> getPDFText("/Users/jakeireland/Downloads/file.pdf", "/tmp/out.txt")
ERROR: Found ' (32)' Expected '<' here
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] skipv at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/BufferParser.jl:25 [inlined]
 [3] read_trailer(::IOStream, ::Int64) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:382
 [4] read_xref_tables(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:458
 [5] doc_trailer_update(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:409
 [6] cosDocOpen(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/CosDoc.jl:139
 [7] PDFIO.PD.PDDocImpl(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDocImpl.jl:16
 [8] pdDocOpen(::String; access::Function) at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDoc.jl:77
 [9] pdDocOpen at /Users/jakeireland/.julia/packages/PDFIO/Miu63/src/PDDoc.jl:77 [inlined]
 [10] getPDFText(::String, ::String) at ./REPL[29]:3
 [11] top-level scope at REPL[31]:1
sambitdash commented 4 years ago

Which version of Julia and PDFIO you are using. May be you are on older versions.

I think you are seeing behavior as seen in: #72 which has been fixed in v0.1.8.

jakewilliami commented 4 years ago

I am using Julia version 1.4.1, and PDFIO version 0.1.7, so you are right (sorry, I didn't actually see #72 before posting this issue). However, I used Pkg.add("PDFIO") a day before posting this issue. Why did Pkg install version 0.1.7 if this is an old version?

jakewilliami commented 4 years ago

It's because I didn't actually update the registry...I only thought it was updated (because I thought that's what it does every time I run using Pkg...

Needed to update

]update

Then check if I can install the latest version with

]add PDFIO@v0.1.9

Thank you for your help and patience!