kermitt2 / pdfalto

PDF to XML ALTO file converter
GNU General Public License v2.0
213 stars 68 forks source link

"Broken" links, or links with no destination in annotations file #127

Open kcstrong opened 3 years ago

kcstrong commented 3 years ago

We're encountering an issue where the annotations file has no DEST(ination) elements nested within the "goto" ACTION elements for many of the PDFs we're converting. As far as we know right now most of them were printed from FrameMaker, distilled, and quite possibly linked using a tool called Compose. The links in these PDF files appear to work just as well in Acrobat as those in files for which there are no missing DEST elements. However, I did notice that the destinations aren't visible in the link properties, even though the links work. I used another tool we have called AutoBookmark to export the links as XML, and so far have only been able to discern that the destination names (ids) have a different format. E.g. in a PDF file with properly converted annotation destinations the ids look something like "A123-subsection-42", while in a PDF file with missing destinations the ids are like "G8.534279".

I'll be sending you a test file. If you could please help or advise that would be wonderful.

Thanks