[BUG] Outline destinations appear to be `null` for some types of PDFs

jonkgrimes commented 2 years ago

Description

We're attempting to parse the outline of a few PDF documents and it appears that the page number is lost for some of the documents using the GetOutlines method on the PdfReader object.

The attached BadOutline.pdf document seems to have the Dest field on the OutlineItem set to null and thus the page numbers are lost. The attached GoodOutline.pdf does not have that problem and is parsed correctly. Additionally, using pypdf2 Python package the correct page numbers are parsed and can be displayed when parsing the BadOutline.pdf (happy to provide that code as well).

Expected Behavior

$ go run main.go BadOutline.pdf
Input file: BadOutline.pdf
{
    "entries": [
        {
            "title": "Basic networkx instructions",
            "dest": {
                "page": 1,
                "mode": "XYZ",
                "x": 125.798,
                "y": 434.577,
                "zoom": 0
            }
        },
        {
            "title": "Assignment",
            "dest": {
                "page": 4,
                "mode": "XYZ",
                "x": 125.798,
                "y": 226.939,
                "zoom": 0
            }
        }
    ]
}

Actual Behavior

go run main.go BadOutline.pdf
Input file: BadOutline.pdf
{
    "entries": [
        {
            "title": "Basic networkx instructions",
            "dest": {
                "page": 0,
                "mode": "",
                "x": 0,
                "y": 0,
                "zoom": 0
            }
        },
        {
            "title": "Assignment",
            "dest": {
                "page": 0,
                "mode": "",
                "x": 0,
                "y": 0,
                "zoom": 0
            }
        }
    ]
}

Attachments

I can reproduce the issue by copying the code from here:

// main.go

package main

import (
        "encoding/json"
        "fmt"
        "os"

        "github.com/unidoc/unipdf/v3/common/license"
        "github.com/unidoc/unipdf/v3/model"
)

func init() {
        err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
        if err != nil {
                panic(err)
        }
}

func main() {
        if len(os.Args) < 2 {
                fmt.Printf("Usage:  go run main.go input.pdf\n")
                os.Exit(1)
        }

        inputPath := os.Args[1]

        fmt.Printf("Input file: %s\n", inputPath)

        pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, nil)
        if err != nil {
                fmt.Printf("Error: %v\n", err)
                os.Exit(1)
        }
        defer f.Close()

        outlines, err := pdfReader.GetOutlines()
        if err != nil {
                fmt.Printf("Error: %v\n", err)
                os.Exit(1)
        }

        data, err := json.MarshalIndent(outlines, "", "    ")
        if err != nil {
                fmt.Printf("Error: %v\n", err)
                os.Exit(1)
        }
        fmt.Printf("%s\n", data)
}

BadOutline.pdf GoodOutline.pdf

github-actions[bot] commented 2 years ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

sampila commented 2 years ago

Hi @jonkgrimes, Thank you for reporting the issue and providing us with the details. This issue should be fixed on UniPDF v3.33.0 https://github.com/unidoc/unipdf-src/releases/tag/v3.33.0

unidoc / unipdf