pcorless / icepdf

PDF Rendering and Viewing API in Java
Apache License 2.0
84 stars 21 forks source link

Potential problem with annotation#getPage(Index) #298

Closed gtache closed 1 year ago

gtache commented 1 year ago

I encountered a problem when calling getPageIndex on an annotation which was on page 3; it returned 0. Looking at the code, it seems possible that getPageIndex will call getPage which will call getObject which will trigger the parser to create a new page with index = 0, and that index will be immediately returned.
Here is a printscreen which shows that right after setting the page reference of the annotation, the page retrieved by the annotation is not the same object:

Capture d’écran du 2023-09-06 14-30-02

I guess this can happen depending on the garbage collector and memory conditions.

gtache commented 1 year ago

Maybe the annotation factory could set the page index of the annotation it just created, but it doesn't solve the underlying problem. I'm not sure what would be the best way to proceed.

pcorless commented 1 year ago

As you noticed the Page object come and go as they can be expensive to keep around on large documents. The general rule is that once they leave the DocumentView viewport the object reference lock is removed and it's free to be garbage collected. There are a few exceptions around text selection that keep the object reference lock a bit longer.

Page numbers/indexes are a bit a strange in pdfs. There is a notation of label for displaying alternative page number like ii or IV, but the index is defined by the page tree traversal which dictates the page ordering. I'll see if I can get you way to determine a page's index given it pObject Reference.

pcorless commented 1 year ago

Pushed a small change which should get you the correct page index every time. The call can be expensive on large documents but better than not getting the correct value.

gtache commented 1 year ago

Thanks for the explanation and the fix, it seems to work perfectly!