jung-kurt / gofpdf

A PDF document generator with high level support for text, drawing and images
http://godoc.org/github.com/jung-kurt/gofpdf
MIT License
4.31k stars 777 forks source link

combine existing pdf to pdf generate by gofpdf into single pdf #258

Closed tylerzika closed 5 years ago

tylerzika commented 5 years ago

Is this possible? I would use go to query the pdf in my database, then get additional data using a JSON api to generate a new pdf with gofpdf, then combine them with gofpdf into a single pdf file.

jung-kurt commented 5 years ago

This is now supported.

@phpdave11 Can you provide a minimal example?

tylerzika commented 5 years ago

@jung-kurt nice! what about embedding a pdf inside a newly generated pdf created by gofpdf?

phpdave11 commented 5 years ago

@tylerzika try the following code to embed an existing PDF into a new gofpdf document. You will probably need to adjust the x, y, w, h arguments.

import "github.com/jung-kurt/gofpdf/contrib/gofpdi"

Import a page from a PDF and get back a template ID (to be used later).

tpl1 := gofpdi.ImportPage(fpdf, "file.pdf", 1, "/MediaBox")

Draw template onto page:

gofpdi.UseImportedTemplate(fpdf, tpl1, 20, 50, 150, 0)

For width and height (the last 2 arguments) you can set one of those to 0 to automatically calculate the correct width or height based on the aspect ratio of the imported page.

Originally posted by @phpdave11 in https://github.com/jung-kurt/gofpdf/issues/8#issuecomment-493492782

I have also created an example that downloads a PDF from the internet and embeds that into a gofpdf document and saves the output to a file. Copy and paste the example code into example.go and then run the following commands:

go get github.com/jung-kurt/gofpdf
go get github.com/jung-kurt/gofpdf/contrib/gofpdi
go run example.go

It will generate a file named example.pdf which contains some text and an embedded PDF.

jung-kurt commented 5 years ago

Thanks, @phpdave11!

tylerzika commented 5 years ago

@jung-kurt @phpdave11 I have the base64 of a pdf that I've retrieved via a json api using go. My function getPdfBinary() returns a []byte aka byte slice. Do I have to create the pdf file first to combine it with a pdf generated by gofpdf? I've tried this

func main() {
    port := getPort()
    log.Println("[-] Listening on...", port)

    http.HandleFunc("/rx-image", getRxImageWithComments)

    err = http.ListenAndServe(port, nil)
    logFatal(err)
}

func getRxImageWithComments(w http.ResponseWriter, r *http.Request) {

    rxPdfBinary := getRxPdfBinary()
    pdf := gofpdf.New("P", "mm", "A4", "")
    pdf.RawWriteBuf(bytes.NewReader(rxPdfBinary))

    pdf.Output(w)
}

func logFatal(err error) {
    if err != nil {
        log.Println(err)
    }
}

to display the pdf, but it doesn't work. I know my variable rxPdfBinary is a valid pdf, because when I add this line to my function

ioutil.WriteFile("rxImage.pdf", rxPdfBinary, 0666)

it writes the pdf locally to my machine and I can view it.

jung-kurt commented 5 years ago

As far as I can tell (correct me if I am wrong, @phpdave11), you will need to write the downloaded PDF to a file first, and then convert it to a template with a call like

tp := gofpdi.ImportPage(pdf, "rxImage.pdf", 1, "/MediaBox")

@phpdave11, a new import function that uses an io.Reader would be a welcome enhancement to the gofpdi package.

phpdave11 commented 5 years ago

@jung-kurt that is correct. I will add support for importing PDF streams without having to save the PDF to a file first.

tylerzika commented 5 years ago

Once creating and importing the template, how do I use gofpdf to display the pdf in the browser? In the past, once I've created a pdf using gofpdf, I'll do pdf.Output(w), w being a http.ResponseWriter, and the pdf would display.

jung-kurt commented 5 years ago

In the past, once I've created a pdf using gofpdf, I'll do pdf.Output(w), w being a http.ResponseWriter, and the pdf would display.

That should still work. Are you checking the return code from pdf.Output() to see if an error occurred during PDF generation? Also, I assume you are setting the appropriate MIME type: application/pdf. Best practice would be to call pdf.Error() before calling pdf.Output() in order to proceed with the HTTP output on success or w.WriteHeader(http.StatusInternalServerError) on error.

tylerzika commented 5 years ago

This is what I'm doing

func getRxImageWithComments(w http.ResponseWriter, r *http.Request) {

    rxPdfBinary := getRxPdfBinary()
    pdf := gofpdf.New("P", "mm", "A4", "")

    ioutil.WriteFile("rxImage.pdf", rxPdfBinary, 0666)

    tp := gofpdi.ImportPage(pdf, "./rxImage.pdf", 1, "/MediaBox")

    gofpdi.UseImportedTemplate(pdf, tp, 20, 50, 150, 0)

    pdf.Error()
    err := pdf.Output(w)
    logFatal(err)
}

No errors are being logged. What is happening is when I go to the web page, it asks if I wanted to load a file, instead of just displaying it in the browser. In addition, when I do download the file, it's not a valid pdf.

I've never had to set the MIME type before when generating a pdf with gofpdf and displaying in browser. That is required? How would I do that?

jung-kurt commented 5 years ago

It looks like setting the MIME type for a PDF is not strictly needed -- the http package will detect the output stream automatically. However, I like to set it explicitly to make it clear.

The call to AddPage() is required. I am not sure why your error log didn't record that.

I assume the logFatal() causes a panic. OK for development but definitely not something you would want in a production server.

Here is an untested example:

func getRxImageWithComments(w http.ResponseWriter, r *http.Request) {

  rxPdfBinary := getRxPdfBinary()
  pdf := gofpdf.New("P", "mm", "A4", "")

  ioutil.WriteFile("rxImage.pdf", rxPdfBinary, 0666)

  tp := gofpdi.ImportPage(pdf, "rxImage.pdf", 1, "/MediaBox")

  pdf.AddPage()
  gofpdi.UseImportedTemplate(pdf, tp, 20, 50, 150, 0)

  err := pdf.Error()
  if err == nil {
    w.Header().Set("Content-Type", "application/pdf")
    pdf.Output(w)
  } else {
    w.WriteHeader(http.StatusInternalServerError)
    log.Printf("error generating PDF: %s", err)
  }
}
tylerzika commented 5 years ago

@jung-kurt that doesn't work for me. Can you recommend any other way to debug this issue?

tylerzika commented 5 years ago

@jung-kurt I don't know if it was a browser caching issue or what, but now I'm getting an error, right after creating the pdf file

func getRxImageWithComments(w http.ResponseWriter, r *http.Request) {

  rxPdfBinary := getRxPdfBinary()
  pdf := gofpdf.New("P", "mm", "A4", "")

  ioutil.WriteFile("rxImage.pdf", rxPdfBinary, 0666)
  log.Println("getting here")

  tp := gofpdi.ImportPage(pdf, "rxImage.pdf", 1, "/MediaBox") // breaks here

  pdf.AddPage()
  gofpdi.UseImportedTemplate(pdf, tp, 20, 50, 150, 0)

  err := pdf.Error()
  if err == nil {
    w.Header().Set("Content-Type", "application/pdf")
    pdf.Output(w)
  } else {
    w.WriteHeader(http.StatusInternalServerError)
    log.Printf("error generating PDF: %s", err)
  }
}

the error is

2019/06/05 23:17:32 http: panic serving [::1]:63628: Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: x+T
goroutine 83 [running]:
net/http.(*conn).serve.func1(0xc0000988c0)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1746 +0xd0
panic(0x1331a60, 0xc000464380)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/panic.go:513 +0x1b9
github.com/phpdave11/gofpdi.(*Importer).ImportPage(0xc000021540, 0x1, 0x1386cb4, 0x9, 0x0)
        /Users/tyler/go/src/github.com/phpdave11/gofpdi/importer.go:85 +0x36f
github.com/jung-kurt/gofpdf/contrib/gofpdi.ImportPage(0x13e8bc0, 0xc0003f4000, 0x138796a, 0xb, 0x1, 0x1386cb4, 0x9, 0x0)
        /Users/tyler/go/src/github.com/jung-kurt/gofpdf/contrib/gofpdi/gofpdi.go:28 +0x73
main.getRxImageWithComments(0x13e8280, 0xc0001880e0, 0xc000118200)
        /Users/tyler/go/src/github.com/label-api/application.go:506 +0x2a3
main.basicAuth.func1(0x13e8280, 0xc0001880e0, 0xc000118200)
        /Users/tyler/go/src/github.com/label-api/application.go:151 +0x34c
net/http.HandlerFunc.ServeHTTP(0xc000021700, 0x13e8280, 0xc0001880e0, 0xc000118200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x160c220, 0x13e8280, 0xc0001880e0, 0xc000118200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc000102410, 0x13e8280, 0xc0001880e0, 0xc000118200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc0000988c0, 0x13e8580, 0xc000130380)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2851 +0x2f5
jung-kurt commented 5 years ago

Can you recommend any other way to debug this issue?

I would factor the PDF generation and the PDF delivery routine. That is, write a route that generates the PDF and returns (pdf []byte, err error) and another routine that delivers the content with an http.ResponseWriter. This way, you can write a command line routine that tests the generation in isolation.

Failed to get content: Failed to get page content: Failed to resolve object: Expected next token to be: endstream, got: x+T

It may be that the PDF you are attempting to embed has some problems.

The most recently called user code in this stack dump is line 85 of github.com/phpdave11/gofpdi/importer.go. @phpdave11, can you follow the stack trace to see what is happening?

phpdave11 commented 5 years ago

It seems like a bug in the stream parsing code.

@tylerzika could you please attach rxImage.pdf so I can see what is going on with that particular PDF? Then I may be able to fix this bug.

tylerzika commented 5 years ago

@phpdave11 dummy.pdf

tylerzika commented 5 years ago

@phpdave11 @jung-kurt I don't believe there is an issue with the pdf file. I've tried multiple pdfs. Each one I get via a REST api. Each time I make the callout to get the pdf, the pdf successfully downloads to my go application folder, and I click on the pdf file in the folder and am able to view it with my OS. This makes me assume that if the pdf is viewable through my OS, there shouldn't be anything wrong with it, right?

tylerzika commented 5 years ago

@phpdave11 @jung-kurt even when I manually download the pdf from this url: https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf

save it to my computer, place the downloaded file in my go application folder, rename it to rxImage.pdf, and remove the code that gets the pdf binary, I still get the same error. This makes me think there is nothing wrong with how my REST api is sending over the pdf.

phpdave11 commented 5 years ago

@tylerzika it looks like there's nothing wrong with that PDF. However, there is a bug on gofpdi which is preventing it from being imported. I am working on fixing the bug now.

phpdave11 commented 5 years ago

@tylerzika this has been fixed in gofpdi v1.0.3.

tylerzika commented 5 years ago

@phpdave11 it is working for some of my pdfs, but not for others. Sometimes the webpage just show a blank pdf, other times it throws an error

2019/06/06 10:42:12 http: panic serving [::1]:55845: Failed to get page boxes: Failed to get page box
goroutine 180 [running]:
net/http.(*conn).serve.func1(0xc0003da140)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1746 +0xd0
panic(0x1331a60, 0xc0001e61e0)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/panic.go:513 +0x1b9
github.com/phpdave11/gofpdi.(*Importer).ImportPage(0xc000021540, 0x1, 0x1386cb4, 0x9, 0xc000236018)
        /Users/tyler/go/src/github.com/phpdave11/gofpdi/importer.go:85 +0x36f
github.com/jung-kurt/gofpdf/contrib/gofpdi.ImportPage(0x13e8bc0, 0xc0001ec000, 0x138796a, 0xb, 0x1, 0x1386cb4, 0x9, 0x0)
        /Users/tyler/go/src/github.com/jung-kurt/gofpdf/contrib/gofpdi/gofpdi.go:28 +0x73
main.getRxImageWithComments(0x13e8280, 0xc0002940e0, 0xc000402200)
        /Users/tyler/go/src/github.com/label-api/application.go:505 +0x1e2
main.basicAuth.func1(0x13e8280, 0xc0002940e0, 0xc000402200)
        /Users/tyler/go/src/github.com/label-api/application.go:151 +0x34c
net/http.HandlerFunc.ServeHTTP(0xc000021700, 0x13e8280, 0xc0002940e0, 0xc000402200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x160c220, 0x13e8280, 0xc0002940e0, 0xc000402200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc000102410, 0x13e8280, 0xc0002940e0, 0xc000402200)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc0003da140, 0x13e8580, 0xc0002ca640)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2851 +0x2f5
2019/06/06 10:42:12 http: panic serving [::1]:55846: Failed to get page boxes: Failed to get page box
goroutine 182 [running]:
net/http.(*conn).serve.func1(0xc0003da280)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1746 +0xd0
panic(0x1331a60, 0xc000176a20)
        /usr/local/Cellar/go/1.11/libexec/src/runtime/panic.go:513 +0x1b9
github.com/phpdave11/gofpdi.(*Importer).ImportPage(0xc000021540, 0x1, 0x1386cb4, 0x9, 0xc00022e130)
        /Users/tyler/go/src/github.com/phpdave11/gofpdi/importer.go:85 +0x36f
github.com/jung-kurt/gofpdf/contrib/gofpdi.ImportPage(0x13e8bc0, 0xc0003a3800, 0x138796a, 0xb, 0x1, 0x1386cb4, 0x9, 0x0)
        /Users/tyler/go/src/github.com/jung-kurt/gofpdf/contrib/gofpdi/gofpdi.go:28 +0x73
main.getRxImageWithComments(0x13e8280, 0xc000294380, 0xc000402400)
        /Users/tyler/go/src/github.com/label-api/application.go:505 +0x1e2
main.basicAuth.func1(0x13e8280, 0xc000294380, 0xc000402400)
        /Users/tyler/go/src/github.com/label-api/application.go:151 +0x34c
net/http.HandlerFunc.ServeHTTP(0xc000021700, 0x13e8280, 0xc000294380, 0xc000402400)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x160c220, 0x13e8280, 0xc000294380, 0xc000402400)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc000102410, 0x13e8280, 0xc000294380, 0xc000402400)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc0003da280, 0x13e8580, 0xc0002ca800)
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
        /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2851 +0x2f5

here is the pdf that shows a blank pdf web page or throws the error above. rxImage.pdf

tylerzika commented 5 years ago

@phpdave11 another peculiar behavior. When I manually swap out pdf files locally while my application is on, the newly swapped out pdf does not display in the browser. The old pdf still shows.

phpdave11 commented 5 years ago

@tylerzika i found a bug in gofpdi in the string parsing code. I made some progress on resolving the issue but rxImage.pdf still doesn't import properly yet (the colors of the imported PDF do not match the source PDF). I'll let you know once this issue has been fully resolved.

Screen Shot 2019-06-10 at 6 03 24 PM
phpdave11 commented 5 years ago

@tylerzika i have fixed the bug that was causing the images in rxImage.pdf to not import properly. This is fixed in gofpdi v1.0.4

image

jung-kurt commented 5 years ago

Nice work, @phpdave11!

tylerzika commented 5 years ago

@phpdave11 I appreciate your efforts. Is the pdf import design to import any pdf? Some of my pdfs works, the more simple ones. Others that are multiple page with more graphics seem to have a harder time. I got this error on a multi-page with heavier graphic:

Failed to read pdf: Failed to read xref table: Expected xref to start with 'xref'. Got: 590

phpdave11 commented 5 years ago

@phpdave11 I appreciate your efforts. Is the pdf import design to import any pdf?

Yes, it should work with most PDFs, but there could still be bugs in the code that prevent some PDFs from being imported.

Some of my pdfs works, the more simple ones. Others that are multiple page with more graphics seem to have a harder time. I got this error on a multi-page with heavier graphic:

Failed to read pdf: Failed to read xref table: Expected xref to start with 'xref'. Got: 590

If you can upload some samples of PDFs that don’t import correctly I will try to debug the issue.

tylerzika commented 5 years ago

@phpdave11 hello again. Sorry I've been working on other things related to the pdf import. As I've done some final testing I've noticed I get an error from peculiar behavior.

2019/07/17 00:25:48 http: panic serving 127.0.0.1:59219: Failed to get page boxes: Failed to get page box goroutine 69 [running]:

Locally, when I go to a url to view imported pdf1 on my web page, it is viewable. Still locally, I then go to a different url to view pdf2, my app errors. I then restart my app and the url to view pdf2 is now viewable and not erroring. But when I go back to the url to view pdf1, the error happens again.

It appears to me that some type of caching problem is going on. As soon as I restart my app, pdfs are viewable. But as soon as I view a different url that loads a different pdf, the app break and I get that error (the one displayed above).

Is there anything I can do with your library, before I import a pdf, to make sure the things are a clean slate?

Here is my current code again

// create pdf
ioutil.WriteFile("rxImage.pdf", rxPdfBinary, 0666)
tp := gofpdi.ImportPage(pdf, "rxImage.pdf", 1, "/MediaBox") // break here
pdf.AddPage()
gofpdi.UseImportedTemplate(pdf, tp, 20, 50, 150, 0)

pdf.AddPage()
tylerzika commented 5 years ago

@phpdave11

This error is happening too:

Failed to get page box

So this line https://github.com/phpdave11/gofpdi/blob/master/reader.go#L1024 is being hit.

tylerzika commented 5 years ago

@phpdave11 @jung-kurt could either of you share a reference on how to learn with dealing with raw pdf data? I'd really like to contribute and solve this bug. A piece of my project that I'm launching in a few weeks is reliant on this pdf import.

phpdave11 commented 5 years ago

@tylerzika are you writing to rxImage.pdf multiple times? That might be an issue because gofpdi will read the file once and cache some of the results. If rxImage.pdf changes after it has been read, the import will probably fail. As a workaround you could try writing the PDF to rxImage-(sha256-of-data).pdf and see if that resolves the issue.

phpdave11 commented 5 years ago

@phpdave11 @jung-kurt could either of you share a reference on how to learn with dealing with raw pdf data? I'd really like to contribute and solve this bug. A piece of my project that I'm launching in a few weeks is reliant on this pdf import.

@tylerzika pull requests are welcome!

I would start by going through the PDF Reference Version 1.4.

You can also go through the gofpdi codebase and study how it parses PDF data. Most of the work is done in reader.go and writer.go. The code is commented so it should be easy to follow (hopefully). Also, gofpdi was ported from fpdi so you could look at that codebase and see how things work in that library.

tylerzika commented 5 years ago

@phpdave11 I ended up creating the pdf using ioutil.TempFile and added a sha to the file name and it works. Thanks for the suggestion!

tylerzika commented 5 years ago

@phpdave11 if a pdf that I'm importing has multiple pages, will the importer import each page? I don't believe it currently does it. Is there a way to do so?

phpdave11 commented 5 years ago

@tylerzika yes, you can specify the page you want to import. You have to import each page separately.

Here page number 1 is being imported:

tpl1 := gofpdi.ImportPage(fpdf, "file.pdf", 1, "/MediaBox")

You can also get the number of pages and page size information using gofpdi: https://github.com/jung-kurt/gofpdf/issues/265#issuecomment-510933446

tylerzika commented 5 years ago

@jung-kurt and @phpdave11, thanks for your help! I was able to implement my desired feature based on what has been shared in this issue.

tylerzika commented 4 years ago

@phpdave11

I'm getting this error from one of my pdfs:

Failed to read pdf: Failed to read xref table: Expected xref to start with 'xref'. Got: 33

phpdave11 commented 4 years ago

Possibly related: https://github.com/phpdave11/gofpdi/issues/16