sambitdash / PDFIO.jl

PDF Reader Library for Native Julia.
Other
127 stars 13 forks source link

pdDocGetOutline() implementation #45

Closed gwierzchowski closed 5 years ago

gwierzchowski commented 5 years ago

Related to #35 Also fixes #44

Please see description of pdDocGetOutline() and usage patters in runtests.jl

PS: Disregard change in CosObjStream.jl: function readuntil was left over from older code after merge at my side - it is not neded and can be deleted. PS2: Some small fixes in CosObject.jl necessary for me. - it looks like populate_values(node::CosTreeNode{String}, dict::CosObject) was not used frequently before.

sambitdash commented 5 years ago

A good idea to call the method pdDocGetOutlines

gwierzchowski commented 5 years ago

I will try to check why test failed on other architectures. At first look there is some compatibility issues with string() function or some differences in how PDF data are being read. One of the issues seems to be related to casting of CDDate type - not related to present changes - probably because of some latest change in Julia head. Honestly I'm little disappointed with Julia being such not portable. My work and dev system is 64bit Linux and all works perfectly there so I did not expected any problems.

We have a Christmas Holidays now, so I will be off-line for a while, Will return to this subject near or just after new year. If you'd came to any ideas in a meantime - please let me know. PS: Can submit separate PR for issue in PDFontMetrics.jl but probably the same test with CDDate will fail again, so I think at first we have to sort out those compatibility issues. Also this fix must be contained in Outline PR because one of unit tests relay on it.

sambitdash commented 5 years ago

@gwierzchowski as suggested earlier, PDFIO is a model level API for PDF specification. Hence, I will not like to incorporate application level assumptions in it.

That being said, I do not think the API you have implemented will be relevant to the APIs expected in the PDFIO. From what I have understood from your implementation there are 3 broad ideas you want to portray.

  1. An API which can provide an outline hierarchy
  2. An outline item which maps a text with a destination in a PDF document. Destinations in PDF are based on absolute page numbers and not page labels.
  3. An API that given a page number can provide the page label for the page.

I will suggest you to implement 1 & 2 as one PR and 3 as another independent PR.

gwierzchowski commented 5 years ago

So I will:

  1. Remove setting :PageLabel key in outline item, leaving only :PageNo and submit this as one PR.
  2. Add new exported API function pdDocGetPageLabel(doc::PDDoc, pageno::Int) which will find and return label for given absolute page number or throw error if document has no PageLabels and submit this as separate PR.

I don't know how to recover from this "Resolve conflicts" situation after I merged your commits to my branch (I tried this button, but I'm getting kind of "this page is outdated" message from github) so I will create new branch on my site and submit new PR for point 1. above rather than continuing this PR.

This may take some time unfortunately, because I'm currently involved in some other urgent work and at this time do not practically have any spare time.

sambitdash commented 5 years ago

I am a bit busy currently and will be able to review and suggest changes if any in 15-20 days time. Sorry for my delayed response.

sambitdash commented 5 years ago

49 merges these functionalities to certain extent. Hence closing.