unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.46k stars 250 forks source link

[BUG] Font objects syntax error while merging a PDF document with another #548

Closed sagar-kalburgi-ripcord closed 3 months ago

sagar-kalburgi-ripcord commented 3 months ago

Description

Hi, when I use unipdf to merge the attached PDF file with another PDF file, it throws this error one of the font objects syntax is not valid - BaseFont undefined: Dict(\"BaseFont\": DejaVuSans, \"CharProcs\": IObject:567, \"Encoding\": Dict(\"Differences\": [46, period, 48, zero, one, two, three, four, five, six, seven, eight, nine, 75, K, 77, M, 97, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, 119, w, 121, y], \"Type\": Encoding, ), \"FirstChar\": 0, \"FontBBox\": [-1021, -463, 1794, 1233], \"FontDescriptor\": IObject:604, \"FontMatrix\": [0.001000, 0, 0, 0.001000, 0, 0], \"LastChar\": 255, \"Name\": DejaVuSans, \"Subtype\": Type3, \"Type\": Font, \"Widths\": IObject:605, )

But Adobe reader and Chrome PDF reader are able to render the PDF document without reporting any font related issues at all. So not sure why only unipdf is running into this. It may be that the document itself has the font configured incorrectly, but Adobe reader and Chrome have no problem rendering it correctly at all.

Expected Behavior

Unipdf needs to handle the merge seamlessly.

Actual Behavior

Use unipdf merge functionality using the attached PDF file and another PDF file of your choice to reproduce the error.

Attachments

S19-1026-NLP-Tasks.pdf

sampila commented 3 months ago

Hi @sagar-kalburgi-ripcord,

we tried to reproduce the issue, but when trying to merge the S19-1026-NLP-Tasks.pdf with this our sample pdf document-header-and-footer-simple, it's works fine.

Likely it's due your system doesn't have DejavuSans font installed.

Example Code

/*
 * Basic merging of PDF files.
 * Simply loads all pages for each file and writes to the output file.
 * See pdf_merge_advanced.go for a more advanced version which handles merging document forms (acro forms) also.
 *
 * Run as: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...
 */

package main

import (
    "fmt"
    "os"

    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/model"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
    if err != nil {
        panic(err)
    }
}

func main() {
    if len(os.Args) < 4 {
        fmt.Printf("Requires at least 3 arguments: output_path and 2 input paths\n")
        fmt.Printf("Usage: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...\n")
        os.Exit(0)
    }

    outputPath := ""
    inputPaths := []string{}

    // Sanity check the input arguments.
    for i, arg := range os.Args {
        if i == 0 {
            continue
        } else if i == 1 {
            outputPath = arg
            continue
        }

        inputPaths = append(inputPaths, arg)
    }

    err := mergePdf(inputPaths, outputPath)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        os.Exit(1)
    }

    fmt.Printf("Complete, see output file: %s\n", outputPath)
}

func mergePdf(inputPaths []string, outputPath string) error {
    pdfWriter := model.NewPdfWriter()

    for _, inputPath := range inputPaths {
        pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, nil)
        if err != nil {
            return err
        }
        defer f.Close()

        numPages, err := pdfReader.GetNumPages()
        if err != nil {
            return err
        }

        for i := 0; i < numPages; i++ {
            pageNum := i + 1

            page, err := pdfReader.GetPage(pageNum)
            if err != nil {
                return err
            }

            err = pdfWriter.AddPage(page)
            if err != nil {
                return err
            }
        }
    }

    fWrite, err := os.Create(outputPath)
    if err != nil {
        return err
    }

    defer fWrite.Close()

    err = pdfWriter.Write(fWrite)
    if err != nil {
        return err
    }

    return nil
}

Sample file

document-header-and-footer-simple.pdf

Command to run

go run main.go output.pdf document-header-and-footer-simple.pdf S19-1026-NLP-Tasks.pdf 

Output PDF

output.pdf

Could you try install the DejavuSans font and run the code?

rcosta-ripcord commented 3 months ago

Hi @sampila, First of all, thank you for providing suggestions and the sample code!

I've been working with @sagar-kalburgi-ripcord on this, and I tried installing the font, which didn't solve the issue on our service.

Using the code you provided, it does work, indeed. However, we are trying to ensure PDF/A compatibility, and as such, I made a small change to your code. The same error persists after my changes, and I confirmed that the font was installed!

Given the above, I have a few questions:


For reference, here's the code with the changes I mentioned:

/*
 * Basic merging of PDF files.
 * Simply loads all pages for each file and writes to the output file.
 * See pdf_merge_advanced.go for a more advanced version which handles merging document forms (acro forms) also.
 *
 * Run as: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...
 */

package main

import (
    "fmt"
    "os"

    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/model"
    "github.com/unidoc/unipdf/v3/model/pdfa"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    err := license.SetMeteredKey(os.Getenv(`UNIDOC_LICENSE_API_KEY`))
    if err != nil {
        panic(err)
    }
}

func main() {
    if len(os.Args) < 4 {
        fmt.Printf("Requires at least 3 arguments: output_path and 2 input paths\n")
        fmt.Printf("Usage: go run pdf_merge.go output.pdf input1.pdf input2.pdf input3.pdf ...\n")
        os.Exit(0)
    }

    outputPath := ""
    inputPaths := []string{}

    // Sanity check the input arguments.
    for i, arg := range os.Args {
        if i == 0 {
            continue
        } else if i == 1 {
            outputPath = arg
            continue
        }

        inputPaths = append(inputPaths, arg)
    }

    err := mergePdf(inputPaths, outputPath)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        os.Exit(1)
    }

    fmt.Printf("Complete, see output file: %s\n", outputPath)
}

func mergePdf(inputPaths []string, outputPath string) error {
    pdfWriter := model.NewPdfWriter()

    // Apply PDF/A-1a Standard with default options
    pdfWriter.ApplyStandard(model.StandardApplier(pdfa.NewProfile1A(pdfa.DefaultProfile1Options())))

    for _, inputPath := range inputPaths {
        pdfReader, f, err := model.NewPdfReaderFromFile(inputPath, nil)
        if err != nil {
            return err
        }
        defer f.Close()

        numPages, err := pdfReader.GetNumPages()
        if err != nil {
            return err
        }

        for i := 0; i < numPages; i++ {
            pageNum := i + 1

            page, err := pdfReader.GetPage(pageNum)
            if err != nil {
                return err
            }

            err = pdfWriter.AddPage(page)
            if err != nil {
                return err
            }
        }
    }

    fWrite, err := os.Create(outputPath)
    if err != nil {
        return err
    }

    defer fWrite.Close()

    err = pdfWriter.Write(fWrite)
    if err != nil {
        return err
    }

    return nil
}
sampila commented 3 months ago

Hi @rcosta-ripcord thanks for providing more detail regarding this.

We investigate this issue.

sampila commented 3 months ago

Hi @rcosta-ripcord and @sagar-kalburgi-ripcord,

We are trying some experiment on PDF/A process, we tried to use standard font available when couldn't get the embedded font from PDF, here's the current results.

PDF Result

output1.pdf

What do you think, do the results acceptable and not affecting your current use case?

rcosta-ripcord commented 3 months ago

Hi @sampila, would it be possible to show what the result would look like with the document @sagar-kalburgi-ripcord attached? I'd also like to know if there's any PR available we can test with

sampila commented 3 months ago

Hi @sampila, would it be possible to show what the result would look like with the document @sagar-kalburgi-ripcord attached? I'd also like to know if there's any PR available we can test with

Hi, the output1.pdf is from the S19-1026-NLP-Tasks.pdf

I will create PR for this specific issue.

sagar-kalburgi-ripcord commented 3 months ago

@sampila Sounds good. We can test against your PR and let you know if it works out for us. Thanks!

sampila commented 3 months ago

Hi @sagar-kalburgi-ripcord and @rcosta-ripcord I created the PR and mentioned this issue on PR, could you check that?

sagar-kalburgi-ripcord commented 3 months ago

Hi @sampila we were unable to find any PR linked to this issue. Could you pls post a link to it here?

sampila commented 3 months ago

Hi @sampila we were unable to find any PR linked to this issue. Could you pls post a link to it here?

The PR can be accessed through ripcord account that has been added into unipdf source code repository, you can access the PR using that account.

sagar-kalburgi-ripcord commented 3 months ago

Hi @sampila, neither of us are able to find any PR although both of us are logged into our Ripcord account on Github

sampila commented 3 months ago

Hi @sagar-kalburgi-ripcord could you check again? you account should having access to the PR already. You can fork that.

sagar-kalburgi-ripcord commented 3 months ago

Hi @sampila. I got access to your Org, however is it possible to add @rcosta-ripcord to your Org as well? he is actively testing these changes right now.

sampila commented 3 months ago

Hi @sampila. I got access to your Org, however is it possible to add @rcosta-ripcord to your Org as well? he is actively testing these changes right now.

Regarding that, @rcosta-ripcord can fork from your forked repo, as currently we are giving the access to 1 member of organization only.

rcosta-ripcord commented 3 months ago

Hi @sampila, @sagar-kalburgi-ripcord and I just tested your PR and it does fix our issue. Please let us know once you merge and release it so we can update the dependency on our services!

Thank you for your help!

sampila commented 3 months ago

Hi @sampila, @sagar-kalburgi-ripcord and I just tested your PR and it does fix our issue. Please let us know once you merge and release it so we can update the dependency on our services!

Thank you for your help!

Hi @rcosta-ripcord, thanks for confirmation, we are adding this issue into our test cases and preparing new UniPDF release. Will notify you after the release

sampila commented 3 months ago

Hi @sagar-kalburgi-ripcord and @rcosta-ripcord,

We released new UniPDF version to fix this issue https://github.com/unidoc/unipdf/releases/tag/v3.56.0

We are closing this issue for now and you can re-open the issue if at latest version not resolve this issue.

Best regards, Alip