qpdf / qpdf

qpdf: A content-preserving PDF document transformer
https://qpdf.sourceforge.io/
Apache License 2.0
3.28k stars 265 forks source link

qpdf fails to find existing bookmark targets for specific PDF #1238

Closed grexe closed 2 weeks ago

grexe commented 3 weeks ago

Tried with latest release and even latest nightly as of today. With the attached PDF, the bookmark example program pdf-bookmarks and also my own code using the library fail to retrieve target pages for all bookmarks, although they are stored in the PDF and displayed correctly in a PDF reader based on xpdf.

I used the command line

pdf-bookmarks --show-targets --lines

For other PDF files it worked flawlessly. Using Haiku nightly here as OS but should be the same in Linux, will try later on request.

So for the attached PDF, this gives me correct bookmarks but not the pages, which I need, too:

+-+ [ -> unknown ] Conclusions
| |
| +-+ [ -> unknown ] Confirmation of Research Hypotheses
| |
| +-+ [ -> unknown ] Limitations and Future Work
|   |
|   +-+ [ -> unknown ] Extension to Other Tasks
|   |
|   +-+ [ -> unknown ] Memory Requirements
|   |
|   +-+ [ -> unknown ] Edge Engineering
|   |
|   +-+ [ -> unknown ] Walk-based Inference
|
+-+ [ -> unknown ] Hyper-parameter Settings
| |
| +-+ [ -> unknown ] Chapter 4
| |
| +-+ [ -> unknown ] Chapter 5
| |
| +-+ [ -> unknown ] Chapter 6
|
+-+ [ -> unknown ] Bibliography

The test PDF is used is: Textual Relation Extraction With Edge-Oriented Graph Neural Models.pdf

m-holger commented 3 weeks ago

Can you try uploading the PDF file again. (It looks to me like you submitted the comment before uploading the file had finished)

grexe commented 3 weeks ago

sorry that was a strange issue with the editor, fixed.

m-holger commented 3 weeks ago

Thanks for reporting the bug. This will be fixed in the next release. In the meantime, here is your output:

|
+-+ [ -> 13 ] Abstract
|
+-+ [ -> 15 ] Declaration
|
+-+ [ -> 16 ] Copyright
|
+-+ [ -> 17 ] Acknowledgements
|
+-+ [ -> 19 ] Abbreviations
|
+-+ [ -> 21 ] Introduction
| |
| +-+ [ -> 21 ] Motivation
| | |
| | +-+ [ -> 22 ] Why Graphs?
| |
| +-+ [ -> 24 ] Research Questions, Hypotheses and Objectives
| |
| +-+ [ -> 25 ] Contributions
| | |
| | +-+ [ -> 26 ] Publications
| |
| +-+ [ -> 27 ] Dissertation Structure
|
+-+ [ -> 29 ] Relation Extraction: An Overview
| |
| +-+ [ -> 29 ] Definitions
| |
| +-+ [ -> 31 ] Associated Tasks
| | |
| | +-+ [ -> 33 ] Challenges
| | |
| | +-+ [ -> 35 ] Datasets and Corpora
| | |
| | +-+ [ -> 39 ] Evaluation Metrics
| |
| +-+ [ -> 42 ] Taxonomy of Approaches
| | |
| | +-+ [ -> 44 ] Supervised Learning
| | | |
| | | +-+ [ -> 45 ] Pattern-oriented Methods
| | | |
| | | +-+ [ -> 46 ] Sequence-oriented Methods
| | | |
| | | +-+ [ -> 50 ] Tree-oriented Methods
| | | |
| | | +-+ [ -> 54 ] Graph-oriented Methods
| | | |
| | | +-+ [ -> 58 ] Structural Hybridity
| | |
| | +-+ [ -> 59 ] Semi-supervised Learning
| | |
| | +-+ [ -> 61 ] Transfer Learning
| | |
| | +-+ [ -> 62 ] Distant Learning
| | |
| | +-+ [ -> 65 ] Unsupervised Approaches
| |
| +-+ [ -> 67 ] Conclusions, Limitations and Challenges
|
+-+ [ -> 71 ] Technical Background
| |
| +-+ [ -> 71 ] Artificial Neural Networks
| |
| +-+ [ -> 73 ] Network Training
| | |
| | +-+ [ -> 73 ] Classification and Cost Function
| | |
| | +-+ [ -> 76 ] Learning
| |
| +-+ [ -> 81 ] Neural Components
|   |
|   +-+ [ -> 81 ] Convolutional Neural Networks
|   |
|   +-+ [ -> 84 ] Recurrent Neural Networks
|   |
|   +-+ [ -> 88 ] Attention Mechanisms
|
+-+ [ -> 91 ] Sentence-level Neural Relation Extraction
| |
| +-+ [ -> 91 ] Motivation
| |
| +-+ [ -> 94 ] Proposed Approach
| | |
| | +-+ [ -> 95 ] Sequence Encoding
| | |
| | +-+ [ -> 97 ] Edge Layer
| | |
| | +-+ [ -> 99 ] Walk-based Inference
| | |
| | +-+ [ -> 102 ] Classification
| |
| +-+ [ -> 102 ] Experimental Settings
| | |
| | +-+ [ -> 102 ] Datasets and Comparisons
| | |
| | +-+ [ -> 105 ] Implementation Details
| |
| +-+ [ -> 107 ] Results
| | |
| | +-+ [ -> 107 ] Candidate Pairs Classification
| | |
| | +-+ [ -> 109 ] Performance Comparison
| |
| +-+ [ -> 110 ] Analysis and Discussion
| | |
| | +-+ [ -> 111 ] Error Analysis
| | |
| | +-+ [ -> 115 ] Walk-based mechanism
| | |
| | +-+ [ -> 117 ] Edge representation enhancements
| |
| +-+ [ -> 121 ] Related Work
| |
| +-+ [ -> 122 ] Conclusion
|
+-+ [ -> 124 ] Adaptation to the Biomedical Domain
| |
| +-+ [ -> 124 ] Biomedical Relation Extraction
| | |
| | +-+ [ -> 125 ] Challenges
| |
| +-+ [ -> 127 ] Scientific Articles
| | |
| | +-+ [ -> 128 ] Chemical-Protein Interactions
| | |
| | +-+ [ -> 129 ] Related Work
| | |
| | +-+ [ -> 130 ] Proposed Approach
| | |
| | +-+ [ -> 132 ] Experimental Settings
| | |
| | +-+ [ -> 133 ] Results and Analysis
| |
| +-+ [ -> 140 ] Electronic Health Records
| | |
| | +-+ [ -> 140 ] Drug-Medication and ADE Interactions
| | |
| | +-+ [ -> 141 ] Related Work
| | |
| | +-+ [ -> 143 ] Motivation
| | |
| | +-+ [ -> 143 ] Proposed Approach
| | |
| | +-+ [ -> 145 ] Experimental Settings
| | |
| | +-+ [ -> 146 ] Results and Analysis
| |
| +-+ [ -> 151 ] Conclusion
|
+-+ [ -> 153 ] Document-level Neural Relation Extraction
| |
| +-+ [ -> 153 ] Motivation
| |
| +-+ [ -> 156 ] Proposed Approach
| | |
| | +-+ [ -> 157 ] Sentence Encoding Layer
| | |
| | +-+ [ -> 157 ] Graph Layer
| | | |
| | | +-+ [ -> 158 ] Node construction
| | | |
| | | +-+ [ -> 158 ] Edge construction
| | |
| | +-+ [ -> 160 ] Inference Layer
| | |
| | +-+ [ -> 162 ] Classification
| |
| +-+ [ -> 162 ] Experimental Settings
| | |
| | +-+ [ -> 162 ] Data and Task Settings
| | |
| | +-+ [ -> 164 ] Model Settings and Comparisons
| |
| +-+ [ -> 165 ] Results
| |
| +-+ [ -> 168 ] Analysis and Discussion
| | |
| | +-+ [ -> 168 ] Exploring the Effect of Edges
| | |
| | +-+ [ -> 171 ] Supplementary Analysis
| |
| +-+ [ -> 173 ] Related Work
| |
| +-+ [ -> 174 ] Conclusion
|
+-+ [ -> 176 ] Conclusions
| |
| +-+ [ -> 177 ] Confirmation of Research Hypotheses
| |
| +-+ [ -> 181 ] Limitations and Future Work
|   |
|   +-+ [ -> 182 ] Extension to Other Tasks
|   |
|   +-+ [ -> 183 ] Memory Requirements
|   |
|   +-+ [ -> 183 ] Edge Engineering
|   |
|   +-+ [ -> 184 ] Walk-based Inference
|
+-+ [ -> 185 ] Hyper-parameter Settings
| |
| +-+ [ -> 185 ] Chapter 4
| |
| +-+ [ -> 187 ] Chapter 5
| |
| +-+ [ -> 187 ] Chapter 6
|
+-+ [ -> 188 ] Bibliography
jberkenbilt commented 3 weeks ago

@m-holger Please make sure that

In this case, since the commit is already merged, can you just reference the PR from the issue or the issue from the PR (since GitHub will make reciprocal links) so we can follow the trail?

jberkenbilt commented 3 weeks ago

@m-holger Oh, I see you already almost did this. The GitHub magic finds the fixes line at the end of a commit comment. Putting it in the PR title is not sufficient.

jberkenbilt commented 3 weeks ago

1240

grexe commented 3 weeks ago

Thanks so much for the quick fix, you guys are amazing and this library is my new favorite, so fast and easy to integrate!

jberkenbilt commented 3 weeks ago

@grexe Thanks -- that's great to hear. Comments like that help keep me motivated after 22 years of working on this, and having recently added @m-holger as a secondary maintainer has definitely breathed new life into the project.