unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.47k stars 250 forks source link

[BUG] Flattening PDF removes images from forms #502

Closed whizkid79 closed 1 year ago

whizkid79 commented 1 year ago

Description

I have a pdf with acrobat form with text and image fields. Technically the image fields are buttons, I think. When I flatten the pdf the image field is discarded.

Expected Behavior

The image should (optionally) be baked into the pdf to make it part of the archived pdf

Actual Behavior

Steps to reproduce the behavior:

fImageData := &fjson.FieldData{}
qrCode, err := makeQrCode(link, 4000)
if err != nil {
    return err
}
img, err := uniDocModel.DefaultImageHandler{}.NewImageFromGoImage(qrCode)
if err != nil {
    return err
}
err = fImageData.SetImage("qrcode", img, nil)
imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
if err != nil {
    return err
}
err := pdfReader.FlattenFields(true, imageFieldAppearance)
if err != nil {
    return err
}

Attachments

Include a self-contained reproducible code snippet and PDF file that demonstrates the issue. template-with-image.pdf

github-actions[bot] commented 1 year ago

Welcome! Thanks for posting your first issue. The way things work here is that while customer issues are prioritized, other issues go into our backlog where they are assessed and fitted into the roadmap when suitable. If you need to get this done, consider buying a license which also enables you to use it in your commercial products. More information can be found on https://unidoc.io/

sampila commented 1 year ago

Hi @whizkid79,

Thank you for reporting the issue, I tried to reproduce the issue using the given PDF file and code snippet, but at the results the image not discarded after doing the flatten, probably I am missing something? Could you try this code below and see if the results is correct?

package main

import (
    "image"
    "math"
    "os"

    "github.com/boombuler/barcode"
    "github.com/boombuler/barcode/qr"

    "github.com/unidoc/unipdf/v3/annotator"
    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/fjson"
    "github.com/unidoc/unipdf/v3/model"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    if err := license.SetMeteredKey(os.Getenv("UNIDOC_LICENSE_API_KEY")); err != nil {
        panic(err)
    }
}

func main() {
    pdfReader, f, err := model.NewPdfReaderFromFile("template-with-image.pdf", nil)
    if err != nil {
        panic(err)
    }
    defer f.Close()

    fImageData := &fjson.FieldData{}
    qrCode, err := makeQrCode("https://unidoc.io", 200, 1)
    if err != nil {
        panic(err)
    }

    img, err := model.DefaultImageHandler{}.NewImageFromGoImage(qrCode)
    if err != nil {
        panic(err)
    }

    err = fImageData.SetImage("qrcode", img, nil)
    imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
    err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
    if err != nil {
        panic(err)
    }
    err = pdfReader.FlattenFields(true, imageFieldAppearance)
    if err != nil {
        panic(err)
    }

    // Write out
    pdfWriter, err := pdfReader.ToWriter(nil)
    if err != nil {
        panic(err)
    }

    err = pdfWriter.WriteToFile("flatten-img.pdf")
    if err != nil {
        panic(err)
    }
}

// Prepare the QR code. The oversampling ratio specifies how many pixels/point to use.  The default resolution of
// PDFs is 72PPI (points per inch). A higher PPI allows higher resolution QR code generation which is particularly
// important if the document is scaled (zoom in).
func makeQrCode(contentStr string, width float64, oversampling int) (image.Image, error) {
    qrCode, err := qr.Encode(contentStr, qr.M, qr.Auto)
    if err != nil {
        return nil, err
    }

    // Prepare the qr code image.
    pixelWidth := oversampling * int(math.Ceil(width))
    qrCode, err = barcode.Scale(qrCode, pixelWidth, pixelWidth)
    if err != nil {
        return nil, err
    }

    return qrCode, err
}

Best regards, Ali

whizkid79 commented 1 year ago

Hi @sampila,

thank you for your extensive testing. It really helped to find my error. I have existing code I need to extend and it uses annotator.FieldAppearance for flattening (err = pdfReader.FlattenFieldsWithOpts(fieldAppearance, flatteningOps) ):

` // Flatten form. fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: false, RegenerateTextFields: true}

// set custom styles
fieldAppearance.SetStyle(annotator.AppearanceStyle{
    BorderSize:          0,
    MultilineLineHeight: 1.1,
    Fonts: &annotator.AppearanceFontStyle{
        FieldFallbacks: getFontStyles(arialFont, arialBoldFont, appearance),
        ForceReplace:   true,
    },
})

`

How can i combine the annotator.FieldAppearance with a annotator.ImageFieldAppearance during the flatting process? Is this possible at all?

Thanks, Peter

sampila commented 1 year ago

Hi @whizkid79,

Yes, it is possible, could you try with this PDF file, filled data JSON, code snippet and see if the results is correct.

package main

import (
    "image"
    "math"
    "os"

    "github.com/boombuler/barcode"
    "github.com/boombuler/barcode/qr"

    "github.com/unidoc/unipdf/v3/annotator"
    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/fjson"
    "github.com/unidoc/unipdf/v3/model"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    if err := license.SetMeteredKey(os.Getenv("UNIDOC_LICENSE_API_KEY")); err != nil {
        panic(err)
    }
}

func main() {
    filePDF := "template-with-image.pdf"

    pdfReader, f, err := model.NewPdfReaderFromFile(filePDF, nil)
    if err != nil {
        panic(err)
    }
    defer f.Close()

    fImageData := &fjson.FieldData{}
    qrCode, err := makeQrCode("https://unidoc.io", 200, 1)
    if err != nil {
        panic(err)
    }

    img, err := model.DefaultImageHandler{}.NewImageFromGoImage(qrCode)
    if err != nil {
        panic(err)
    }

    err = fImageData.SetImage("qrcode", img, nil)
    imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
    err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
    if err != nil {
        panic(err)
    }

    fillDataJson, err := fjson.LoadFromJSONFile("./fill-data.json")
    if err != nil {
        panic(err)
    }

    fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: true, RegenerateTextFields: true}

    err = pdfReader.AcroForm.FillWithAppearance(fillDataJson, fieldAppearance)
    if err != nil {
        panic(err)
    }

    err = pdfReader.FlattenFields(true, fieldAppearance)
    if err != nil {
        panic(err)
    }

    // Write out
    pdfWriter, err := pdfReader.ToWriter(nil)
    if err != nil {
        panic(err)
    }

    err = pdfWriter.WriteToFile("flatten-img.pdf")
    if err != nil {
        panic(err)
    }
}

// Prepare the QR code. The oversampling ratio specifies how many pixels/point to use.  The default resolution of
// PDFs is 72PPI (points per inch). A higher PPI allows higher resolution QR code generation which is particularly
// important if the document is scaled (zoom in).
func makeQrCode(contentStr string, width float64, oversampling int) (image.Image, error) {
    qrCode, err := qr.Encode(contentStr, qr.M, qr.Auto)
    if err != nil {
        return nil, err
    }

    // Prepare the qr code image.
    pixelWidth := oversampling * int(math.Ceil(width))
    qrCode, err = barcode.Scale(qrCode, pixelWidth, pixelWidth)
    if err != nil {
        return nil, err
    }

    return qrCode, err
}

template-with-image.pdf

fill-data.json

[
    {
        "name": "Fullname",
        "value": "Bruce Wayne"
    },
    {
      "name": "Information",
      "value": "Wayne Enterprise"
    }
]

Best regards, Alip

sampila commented 1 year ago

Hi @whizkid79,

Do the solution work well on your end?

Best regards, Alip

whizkid79 commented 1 year ago

Sorry, I haven't had time to test this. Will do it tomorrow.

whizkid79 commented 1 year ago

Hi @sampila, I've tested your code and it seems to work. I double checked my application and it didn't work until I changed:

    fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: false, RegenerateTextFields: true}

to

    fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: true, RegenerateTextFields: true}

With OnlyIfMissing set to false it removes the qrcode during flattening. I'm not sure if it is ok to set it to true because that might cause side effects along the way, if a template is for some reason pre-filled. The qr code field is not part of my fill-data.json, so I think it shouldn't be touched...

Any ideas?

sampila commented 1 year ago

Hi @whizkid79,

The image filling will happened at this part

err = fImageData.SetImage("qrcode", img, nil)
    imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
    err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
    if err != nil {
        panic(err)
    }

Should be fine I think even though the QR code not part of fill-data.json. Probably better to try with pre-filled form fields? especially the image form fields.

whizkid79 commented 1 year ago

The image is removed during the flatting process if I have OnlyIfMissing: false set. if i remove the flattening command everything works (it is just not flattened). But I need it flattened to avoid later changes to the document.

sampila commented 1 year ago

Hi @whizkid79,

Could you share a runnable code snippet, so I can reproduce same issue here?

whizkid79 commented 1 year ago

This seams to cause the issue:

It also happens if you switch the order of filling (first image then other fields or vice versa, both are broken).

package main

import (
    "image"
    "math"
    "os"

    "github.com/boombuler/barcode"
    "github.com/boombuler/barcode/qr"

    "github.com/unidoc/unipdf/v3/annotator"
    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/fjson"
    "github.com/unidoc/unipdf/v3/model"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    err := license.SetLicenseKey(os.Getenv("UNIDOC_LICENCE_KEY"), os.Getenv("UNIDOC_LICENCE_CUSTOMER"))
    if err != nil {
        panic(err)
    }
}

func main() {
    filePDF := "template-with-image2.pdf"

    pdfReader, f, err := model.NewPdfReaderFromFile(filePDF, nil)
    if err != nil {
        panic(err)
    }
    defer f.Close()

    fImageData := &fjson.FieldData{}
    qrCode, err := makeQrCode("https://unidoc.io", 200, 1)
    if err != nil {
        panic(err)
    }

    img, err := model.DefaultImageHandler{}.NewImageFromGoImage(qrCode)
    if err != nil {
        panic(err)
    }

    err = fImageData.SetImage("qrcode", img, nil)
    imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
    err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
    if err != nil {
        panic(err)
    }

    fillDataJson, err := fjson.LoadFromJSONFile("./fill-data.json")
    if err != nil {
        panic(err)
    }

    fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: false, RegenerateTextFields: true}

    err = pdfReader.AcroForm.FillWithAppearance(fillDataJson, fieldAppearance)
    if err != nil {
        panic(err)
    }

    err = pdfReader.FlattenFields(true, fieldAppearance)
    if err != nil {
        panic(err)
    }

    // Write out
    pdfWriter, err := pdfReader.ToWriter(nil)
    if err != nil {
        panic(err)
    }

    err = pdfWriter.WriteToFile("flatten-img.pdf")
    if err != nil {
        panic(err)
    }
}

// Prepare the QR code. The oversampling ratio specifies how many pixels/point to use.  The default resolution of
// PDFs is 72PPI (points per inch). A higher PPI allows higher resolution QR code generation which is particularly
// important if the document is scaled (zoom in).
func makeQrCode(contentStr string, width float64, oversampling int) (image.Image, error) {
    qrCode, err := qr.Encode(contentStr, qr.M, qr.Auto)
    if err != nil {
        return nil, err
    }

    // Prepare the qr code image.
    pixelWidth := oversampling * int(math.Ceil(width))
    qrCode, err = barcode.Scale(qrCode, pixelWidth, pixelWidth)
    if err != nil {
        return nil, err
    }

    return qrCode, err
}
sampila commented 1 year ago

Hi @whizkid79,

I apologize for the slow on responding this, you can achieve the correct results without worrying about pre-filled form fields by partially flatten fields, example: https://github.com/unidoc/unipdf-examples/blob/master/forms/pdf_form_partial_flatten.go

You can try out this code

package main

import (
    "image"
    "math"
    "os"

    "github.com/boombuler/barcode"
    "github.com/boombuler/barcode/qr"

    "github.com/unidoc/unipdf/v3/annotator"
    "github.com/unidoc/unipdf/v3/common/license"
    "github.com/unidoc/unipdf/v3/fjson"
    "github.com/unidoc/unipdf/v3/model"
)

func init() {
    // Make sure to load your metered License API key prior to using the library.
    // If you need a key, you can sign up and create a free one at https://cloud.unidoc.io
    if err := license.SetMeteredKey(os.Getenv("UNIDOC_LICENSE_API_KEY")); err != nil {
        panic(err)
    }
}

func main() {
    filePDF := "template-with-image.pdf"

    pdfReader, f, err := model.NewPdfReaderFromFile(filePDF, nil)
    if err != nil {
        panic(err)
    }
    defer f.Close()

    fImageData := &fjson.FieldData{}
    qrCode, err := makeQrCode("https://unidoc.io", 200, 1)
    if err != nil {
        panic(err)
    }

    img, err := model.DefaultImageHandler{}.NewImageFromGoImage(qrCode)
    if err != nil {
        panic(err)
    }

    err = fImageData.SetImage("qrcode", img, nil)
    imageFieldAppearance := annotator.ImageFieldAppearance{OnlyIfMissing: false}
    err = pdfReader.AcroForm.FillWithAppearance(fImageData, imageFieldAppearance)
    if err != nil {
        panic(err)
    }

    // Only flatten field named `qrcode`.
    fieldFlattenOpts := model.FieldFlattenOpts{
        FilterFunc: func(pf *model.PdfField) bool {
            if pf.T.String() == "qrcode" {
                return true
            }

            return false
        },
    }

    // With `FlattenFieldsWithOpts` flatten form field named `qrcode`.
    err = pdfReader.FlattenFieldsWithOpts(imageFieldAppearance, &fieldFlattenOpts)
    if err != nil {
        panic(err)
    }

    fillDataJson, err := fjson.LoadFromJSONFile("./fill-data.json")
    if err != nil {
        panic(err)
    }

    fieldAppearance := annotator.FieldAppearance{OnlyIfMissing: false, RegenerateTextFields: true}

    err = pdfReader.AcroForm.FillWithAppearance(fillDataJson, fieldAppearance)
    if err != nil {
        panic(err)
    }

    err = pdfReader.FlattenFields(true, fieldAppearance)
    if err != nil {
        panic(err)
    }

    // Write out
    pdfWriter, err := pdfReader.ToWriter(nil)
    if err != nil {
        panic(err)
    }

    err = pdfWriter.WriteToFile("flatten-img.pdf")
    if err != nil {
        panic(err)
    }
}

// Prepare the QR code. The oversampling ratio specifies how many pixels/point to use.  The default resolution of
// PDFs is 72PPI (points per inch). A higher PPI allows higher resolution QR code generation which is particularly
// important if the document is scaled (zoom in).
func makeQrCode(contentStr string, width float64, oversampling int) (image.Image, error) {
    qrCode, err := qr.Encode(contentStr, qr.M, qr.Auto)
    if err != nil {
        return nil, err
    }

    // Prepare the qr code image.
    pixelWidth := oversampling * int(math.Ceil(width))
    qrCode, err = barcode.Scale(qrCode, pixelWidth, pixelWidth)
    if err != nil {
        return nil, err
    }

    return qrCode, err
}

Best regards, Alip

sampila commented 1 year ago

Hi @whizkid79,

Do the solution work well on your end?

whizkid79 commented 1 year ago

Hi @sampila yes thanks. I already had a similar solution with a FilterFunc. It just seemed too complicated and "wrong". I still believe that the flattening process should be more intelligent when in comes to fields not matching the use apperance-type. At least as an additional option. Best regards, Peter

sampila commented 1 year ago

Hi @whizkid79,

Thank you for the confirmation, we will keep it as notes for the improvements of UniPDF.