unidoc / unipdf

Golang PDF library for creating and processing PDF files (pure go)
https://unidoc.io
Other
2.5k stars 249 forks source link

[BUG] werr is not cleared when starting new call to PdfWriter.Write() #358

Closed ahawtho closed 2 years ago

ahawtho commented 4 years ago

Description

If a PdfWriter receives an error while writing, and the operation is attempted again (even with a different downstream io.Writer), it will fail because PdfWriter.werr is not cleared.

Expected Behavior

A new Write() should succeed as long as the io.Writer does not fail.

Actual Behavior

Steps to reproduce the behavior:

  1. Run attached, it should run to completion

Attachments

Include a self-contained reproducible code snippet and PDF file that demonstrates the issue.

package main

import (
    "fmt"
    "io"
    "os"

    "github.com/unidoc/unipdf/v3/model"
)

type Werr struct {
}

func (w Werr) Write(p []byte) (n int, err error) {
    return 0, fmt.Errorf("error")
}

func (w Werr) asWriter() io.Writer {
    return w
}

func main() {
    writer := model.NewPdfWriter()
    _ = writer.Write(Werr{})

    dn, err := os.OpenFile("/dev/null", os.O_WRONLY, 0)
    if err != nil {
        panic(err)
    }
    if err := writer.Write(dn); err != nil {
        panic(err)
    }
}
gunnsth commented 4 years ago

@ahawtho Thanks for reporting. Regarding the use case. Do you have a case where calling Write twice on the same PdfWriter makes sense? Typically the right use pattern is to call Write and write to a buffer or a file once.

ahawtho commented 4 years ago

Sorry for the late reply.

We have a use-case where we generate our PDF and then attempt to write the PDF to Google Drive. The API for Google Drive allows us to fetch a Writer, but the write can fail because of transient network errors. In this case, our application has retry logic to attempt to write the PDF to a new writer after a backoff. This process is a batch process, but generating the PDF is more time consuming than doing the write, so I'd prefer not to regenerate the PDF.

A workaround for us might be to write to a file first, but this use-case had been working fine in the previous version of the product. Do you know of any issues with writing the file twice besides resetting werr ?

gunnsth commented 4 years ago

Write() can do some operations that affect state such as optimizing the PDF structure. If you call it twice in a row, the result is not guaranteed to be identical. If you need to re-use the result, I would suggest Write()-ing to a buffer such as bytes.Buffer and then reuse the output. That is also more efficient.

Looking more into it, it seems to me that Write() really should be only allowed to be called once. We may consider returning an error for sequential calls to it on the same PdfWriter object.