sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

_error not defined #78

Closed sebastianpech closed 4 years ago

sebastianpech commented 4 years ago

There's a function _error used in src/CosReader.jl that is undefined.

sambitdash commented 4 years ago

Can you upload the file where you observed this behavior? Although the fix may be easy, for a test case to be added we need to have the file.

sebastianpech commented 4 years ago

No I can't, but I will check if I can extract the part thats causing the error

sambitdash commented 4 years ago

@sebastianpech I guess #80 should fix this. But, I will keep the issue open just so that you can provide us an isolated extracted file to add to the test case.

sebastianpech commented 4 years ago

This is the error I get:

ERROR: Unexpected character at 263 found 100
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] _error(::String, ::Base.GenericIOBuffer{Array{UInt8,1}}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/CosReader.jl:6
 [3] parse_xstring(::Base.GenericIOBuffer{Array{UInt8,1}}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/CosReader.jl:201
 [4] parse_value(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Function) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/CosReader.jl:28
 [5] parse_value at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/CosReader.jl:26 [inlined]
 [6] on_cmap_command!(::Base.GenericIOBuffer{Array{UInt8,1}}, ::Symbol, ::Array{CosInt,1}, ::PDFIO.PD.CMap) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDFonts.jl:349
 [7] read_cmap(::Base.GenericIOBuffer{Array{UInt8,1}}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDFonts.jl:384
 [8] get_unicode_mapping(::PDFIO.Cos.CosIndirectObject{CosStream}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDFonts.jl:153
 [9] get_unicode_mapping(::PDFIO.Cos.CosDocImpl, ::PDFIO.Cos.CosIndirectObject{CosDict}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDFonts.jl:143
 [10] Type at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDFonts.jl:411 [inlined]
 [11] get_pd_font!(::PDFIO.PD.PDDocImpl, ::PDFIO.Cos.CosIndirectObject{CosDict}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDDocImpl.jl:112
 [12] get_font(::PDFIO.PD.PDPageImpl, ::CosName) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPage.jl:311
 [13] evalContent!(::PDPageElement{:Tf}, ::PDFIO.PD.GState{:PDFIO}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPageElement.jl:775
 [14] evalContent! at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPageElement.jl:658 [inlined]
 [15] evalContent!(::PDPageTextObject, ::PDFIO.PD.GState{:PDFIO}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPageElement.jl:720
 [16] evalContent! at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPageElement.jl:658 [inlined]
 [17] pdPageEvalContent(::PDFIO.PD.PDPageImpl, ::PDFIO.PD.GState{:PDFIO}) at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPage.jl:145
 [18] pdPageEvalContent at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPage.jl:144 [inlined]
 [19] pdPageExtractText at /Users/sebastianpech/.julia/packages/PDFIO/Q0Nk3/src/PDPage.jl:178 [inlined]

Do you have any tipps for debugging that. My problem is that the error occurs in a pdf generated not on my device. After removing a page, undo to add it back and then save on my device the error vanishes.

sebastianpech commented 4 years ago

Apparently it's a space (0x20) that's causing the problem

sambitdash commented 4 years ago

@sebastianpech the _error was a bug in the library which has been fixed.

For the specific error reported in your file now, I will have to review the file to comment. If you are ok, you can share the file with me over email as well. It will be hard to comment from the trace you sent. The issue seems to be in a font file embedded inside the PDF file.

sambitdash commented 4 years ago

https://github.com/sambitdash/PDFIO.jl/commit/9ceafda580322eb836214e5f8346bf1cdc46b6be may fix the issue as reported if I go by your comment on the space character.

@sebastianpech Please take the latest build and try out.

sebastianpech commented 4 years ago

@sebastianpech the _error was a bug in the library which has been fixed.

Yeah I know. But I'm curious what's causing the error.

Sadly I really can't send you the file, it contains personal data from students and I can't get the data out as the moment I change something the error is gone.

9ceafda may fix the issue as reported if I go by your comment on the space character.

Currently building

sebastianpech commented 4 years ago

That did the trick thanks.

I have another pdf which fails at this https://github.com/sambitdash/PDFIO.jl/blob/eeff74cd01dd29839465bb070b63c929bacd5e16/src/PDPageElement.jl#L618 assertation. Could that be related to objects being larger than the pagewidth?

sambitdash commented 4 years ago

I have another pdf which fails at this https://github.com/sambitdash/PDFIO.jl/blob/eeff74cd01dd29839465bb070b63c929bacd5e16/src/PDPageElement.jl#L618

assertation. Could that be related to objects being larger than the pagewidth?

Please file a separate bug on this.

sambitdash commented 4 years ago

That did the trick thanks.

Closing this bug based on the comment above.