sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

ERROR: Found 'j(106)' Expected 'x' here #94

Closed DilumAluthge closed 3 years ago

DilumAluthge commented 3 years ago

I have two almost identical files, foo.pdf and bar.pdf. I can open them both in a PDF viewer.

I cannot open bar.pdf in PDFIO.jl, but I can open foo.pdf in PDFIO.jl. See the log below.

I can't share the PDF files publicly, but I'd be happy to email them to you if you like.

julia> using PDFIO

julia> bar = PDFIO.pdDocOpen("bar.pdf")

PDDoc ==>

CosDoc ==>
    filepath:       /Users/dilum/Downloads/bar.pdf
    size:           11675219
    hasNativeXRefStm:    false
    Trailer dictionaries:
    <<
    /Root   1 0 R
    /Size   3142
    /Info   2 0 R
>>

Catalog:
1 0 obj
<<
    /Pages  3 0 R
    /Type   /Catalog
>>
endobj

isTagged: none

julia> foo = PDFIO.pdDocOpen("foo.pdf")
ERROR: Found 'j(106)' Expected 'x' here
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] skipv at /Users/dilum/.julia/dev/PDFIO/src/BufferParser.jl:25 [inlined]
 [3] skipv at /Users/dilum/.julia/dev/PDFIO/src/BufferParser.jl:30 [inlined]
 [4] read_xref_table(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:491
 [5] read_xref_tables(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:460
 [6] doc_trailer_update(::IOStream, ::PDFIO.Cos.CosDocImpl) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:412
 [7] cosDocOpen(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/CosDoc.jl:141
 [8] PDFIO.PD.PDDocImpl(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/PDDocImpl.jl:16
 [9] pdDocOpen(::String; access::Function) at /Users/dilum/.julia/dev/PDFIO/src/PDDoc.jl:77
 [10] pdDocOpen(::String) at /Users/dilum/.julia/dev/PDFIO/src/PDDoc.jl:77
 [11] top-level scope at REPL[8]:1

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, haswell)

I am on the master branch of PDFIO.jl, i.e. I installed it with ] add PDFIO#master.

And the PDFIO.jl tests pass for me.

sambitdash commented 3 years ago

@DilumAluthge what you call as identical PDF files are differentiated by the scheme you mentioned in your post on discourse recently. Sorry, I am presuming you are the same person.

https://discourse.julialang.org/t/renumbering-pdf-files-can-this-be-implemented-with-pdfio-jl-instead/53144

I will not be surprised PDFs generated by your code will have wrong cross references and are corrupt from a PDF specification standpoint. While PDFIO has been made lenient at places to accommodate the PDFs generated by some well known creators but generally I am not in favor of supporting any file that has been tampered with and inconsistent with the PDF specifications.

DilumAluthge commented 3 years ago

Actually, I still get this error even without running the code in my Discourse comment.

If bar.pdf has this:

1 0 obj
<</Type /Catalog /Pages 3 0 R
>>
endobj

And foo.pdf has this:

1 0 obj
<</Type /Catalog
  /Pages 3 0 R
>>
endobj

And there are no other differences between the files, then PDFIO can load bar.pdf but not foo.pdf.

This should be legal, right? You are allowed to add whitespace inside a dictionary? At least, that's what it says here: https://www.oreilly.com/library/view/developing-with-pdf/9781449327903/ch01.html#example_1-6

DilumAluthge commented 3 years ago

Also, this works:

1 0 obj
<</Type /Catalog /Pages 3 0 R
>>
endobj

But this errors:

1 0 obj
<</Type /Catalog /Pages 3 0 R>>
endobj

And again, the example here seems to imply that you are allowed to strip out the whitespace inside a dictionary.

sambitdash commented 3 years ago

Ensure you have fixed the cross reference tables and/or dictionaries after you add the any whitespaces. The same chapter tells you how to update the cross reference tables. PDF objects are located based on the object offsets from cross reference tables.

DilumAluthge commented 3 years ago

Thanks!

Does PDFIO have any functionality for generating the cross-reference table? Or do I need to do it manually?

DilumAluthge commented 3 years ago

I.e. is there a function that would parse the entire PDF file, make a list of all the indirect objects, and output the table?

Thanks for all of your help, both here and on Discourse! As you can tell, I am very new to working with PDF files.