Add support to read a pdf file to use it as a template

murillodaviziko commented 9 years ago

It may be useful to be able to read an existing pdf file, and be able to write over it.

marcus-downing commented 9 years ago

I started work on porting the FPDI library here: https://github.com/marcusatbang/gofpdf

But life hasn't allowed me to make as much progress as I'd like on it recently.

murillodaviziko commented 9 years ago

What's the status of it? I go get it, and tried to run the tests, but failed.

I can't see how to load an external pdf file as a template. Any hints? EDIT: I had to swap the import from jung-kurt to marcusatbang !?

marcus-downing commented 9 years ago

The FPDI library, which allows reading existing PDFs, is built on a smaller library called FPDF_TPL, which extends FPDF to allow you to define an re-use templates when making PDFs. The status is that I've more or less finished porting FPDF_TPL to Go as part of that fork; I haven't got far enough with porting FPDI itself to be worth committing the code.

If you look in the file template.go, you can see the functions I've added; and this test case shows how you can use the templating functionality:

    template := pdf.CreateTemplate(func(tpl *gofpdf.Tpl) {
        tpl.SetFont("Arial", "B", 16)
        tpl.Text(60, 20, "Hello World!")
        tpl.SetDrawColor(0, 100, 200)
        tpl.SetLineWidth(2.5)
        tpl.Line(120, 20, 140, 40)
    })
    _, tplSize := template.Size()
    pdf.AddPage()
    pdf.UseTemplate(template)
    pdf.UseTemplateScaled(template, gofpdf.PointType{0, 20}, tplSize)

I've also added a few utility functions to scale and move the point and size types to aid in this. The API is a significant departure from the PHP original, because PHP has a lot of dependence on nullable values that's discouraged or impossible in Go.

To make the tests run, I believe you need to modify the tests to replace:

import (
        ...
    "github.com/jung-kurt/gofpdf"

with

import (
        ...
    "github.com/marcusatbang/gofpdf"

but of course I didn't want to check that change in before making a pull request.

Unfortunately life has not allowed me much time to work on this, nor is it likely to in the near future, so I can't make any promises about when I'll be able to do more.

jung-kurt commented 9 years ago

Thanks for your work on this. When the dust settles I will merge your changes in.

marcus-downing commented 9 years ago

I still intend to finish the job and add the fPDI functionality as well, I just can't promise when.

I should presumably make sure my tutorial example numbers don't conflict with anybody else's?

jung-kurt commented 9 years ago

Hi Marcus,

I still intend to finish the job and add the fPDI functionality as well

Super!

I just can't promise when.

Whenever is fine.

I should presumably make sure my tutorial example numbers don't conflict with anybody else's?

First, many thanks for including a tutorial example. And second, don't worry about numbering -- that's easily changed when your work is merged.

Thanks!

-- Kurt

tgulacsi commented 9 years ago

Sorry to chime in, but @marcusatbang, can't we use rsc.io/pdf for parsing the PDF? It's a low-level, well tested PDF parsing library.

marcus-downing commented 9 years ago

I wasn't familiar with that one. FPDI has the advantage that I have the original author's permission to use it, and it was made to fit in with FPDF. Using rsc.io/pdf instead would mean connecting two codebases that weren't built to work together. I can see the advantage of not having to duplicate the work and of potentially fitting in with workflows that already use rsc.io/pdf.

@jung-kurt, are you familiar with rsc.io/pdf? Do you have an opinion on it?

jung-kurt commented 9 years ago

@marcusatbang No, I haven't used rsc.io/pdf. You've identified the main issues as I see them. If you elect to use rsc.io/pdf, I think your package should reside in the proposed contrib directory in order to avoid any foreign dependency in gofpdf itself.

marcus-downing commented 9 years ago

Putting it in contrib sounds good either way. The templates functionality needed to be baked in, but anything that implements gofpdf.Template shouldn't need to be.

marcus-downing commented 9 years ago

What should the package in contrib be called? Should it be named for the libraries it's bridging/porting (either rscio2fpdf or fpdi) or by its purpose (read, readpdf, etc)?

jung-kurt commented 9 years ago

I like naming it for the package's purpose. We can document the origin in the acknowledgments.

marcus-downing commented 9 years ago

Okay, contrib/read it is.

marcus-downing commented 9 years ago

rsc.io/pdf doesn't export a lot of its private info. Short of forking it, I'm not sure how workable it'll be. I may have to go back to plan A.

marcus-downing commented 9 years ago

I keep going back and forth between the two options. rsc.io/pdf has done a lot of the work of making an idiomatic Go parser, but it's also missing a great deal. For example, it doesn't do images, it won't give you a page size, and it won't give you the raw stream for a page. And the author hasn't accepted pull requests or done anything in two years.

I'm starting to think forking it is the best answer.

jung-kurt commented 9 years ago

A couple of the rsc.io/pdf forks (for example, carstn and josharian) look active. Have you checked those out?

marcus-downing commented 9 years ago

Some useful additions, but none of them diverge very far from the origin, nor add the things I need adding. Most importantly, none of them look to be in any danger of taking on the role of the single canonical source. The freedom to fork on GitHub is nice, but it does nobody good to fragment a dependency.

I've emailed the author to see if he's still interested in the project.

marcus-downing commented 9 years ago

I'm making extremely slow progress on the port, but learning a lot about how buffered IO works in Go while I do it. :)

jerbob92 commented 8 years ago

@marcusatbang, I forked your fork and merged the current master into it. I'm planning to finish this in a couple of days, are you okay with that?

marcus-downing commented 8 years ago

Absolutely fine. Let me make sure all my WIP is checked in first.

marcus-downing commented 8 years ago

The latest commit does more to read page boxes, lays down the pattern for stream decoding (but doesn't actually do any of it - I intend to look for off-the-shelf Go decoders for gzip, LZW etc), and tidies up the handling of object references a little. It's also got some attempt at handling errors better.

The biggest challenge isn't porting the literal PHP code, but adapting it to a Go way of doing things.

jerbob92 commented 8 years ago

Hi @marcusatbang,

I already did work today: https://github.com/jerbob92/gofpdf/tree/feature/read

I got basic PDF merging working now. I'm going to look at images next.

jerbob92 commented 8 years ago

This is how I'm doing the merge now:

reader, err := gofpdi.OpenFromFileName("images/pdf.pdf")
for i := 1; i < reader.CountPages(); i++ {
    template := reader.Page(i).Bytes()

    var templateBuffer bytes.Buffer
    templateBuffer.Write(template)
    pdf.AddPage()
    pdf.RawWriteBuf(&templateBuffer)
}

Obviously not the proper way, but the basic stuff is working.

marcus-downing commented 8 years ago

Straight copy of the bytes from one file to the next? Yeah, that's really not the ideal way :) but you know that.

For my own purposes, I need to make sure that imported pages can be used freely as templates. That means rescaling them, putting them in different places, and most important for me, they need to interact with other elements like overlaid objects with a blend mode applied. Only a real parser can do all that.

Not that I have a right to complain, when I've made such little progress on it.

jerbob92 commented 8 years ago

Yeah but please note that reader.Page(i).Bytes() are the bytes of the actual page content. That means the parser (up till page content parser) is working. To get to that point there was quite some work, currently I did the following work:

trailer parsing
/Root parsing
/Pages Parsing
PageBox parsing
PDFPage (/Kids) parsing
Encryption detection
readValue fixes to properly parse array/dictionary/stream
Fixes in the scanner to properly read arrays/dictionaries
Fixes to the objectResolver
Page rotation detection (not being used yet though)
Page resource parsing
Page content parsing
Stream decoding

Also please note that write calls after the RawWriteBuf works, I just tried to put some cells over my template and it worked fine.

marcus-downing commented 8 years ago

I haven't had time to look in detail, but some of that does indeed look good. I'll have to compare to the changes that I hadn't pushed (but have now), since we've both worked on some of the same areas, but I expect there'll be no value in merging my version into your perfectly good code.

tdewolff commented 8 years ago

Any updates? I too would like to use existing PDFs as templates.

jerbob92 commented 8 years ago

Hi @tdewolff, didn't have more progress on this, I justed wanted to embed pdf in pdf so I'm currently just encoding the pdf to image and then inserting them into the new pdf.

jerbob92 commented 8 years ago

@middelink, who are you asking?

marcus-downing commented 8 years ago

I think @middelink's comment got deleted there, asking for information on the FPDF_TPL port.

The work on FPDF_TPL was completed, submitted and merged about a year ago. To create a custom template builder, you need to implement the gofpdf.Template interface here: https://github.com/jung-kurt/gofpdf/blob/master/template.go#L104

type Template interface {
    ID() int64
    Size() (PointType, SizeType)
    Bytes() []byte
    Images() map[string]*ImageInfoType
    Templates() []Template
}

Always use gofpdf.GenerateTemplateID() for the ID, and return the fully rendered page blob in Bytes().

Porting FPDI, to read existing files and extract the pages, is a much bigger job, that I still haven't been able to find time to make significant progress on. @jerbob92's work is probably the best you can get for now. In fact, while I've stalled on this my own needs have changed - there's a chance I'll be trying to port it into high performance JavaScript instead.

middelink commented 8 years ago

Actually, I deleted the comment myself as I found out it was already there ^^

On 04/10/2016 12:11 AM, Marcus Downing wrote:

I think @middelink https://github.com/middelink's comment got deleted there, asking for information on the FPDF_TPL port.

The work on FPDF_TPL was completed, submitted and merged about a year ago. To create a custom template builder, you need to implement the |gofpdf.Template| interface here: https://github.com/jung-kurt/gofpdf/blob/master/template.go#L104

|type Template interface { ID() int64 Size() (PointType, SizeType) Bytes() []byte Images() map[string]*ImageInfoType Templates() []Template } |

Always use |gofpdf.GenerateTemplateID()| for the ID, and return the fully rendered page blob in |Bytes()|.

Porting FPDI, to read existing files and extract the pages, is a much bigger job, that I still haven't been able to find time to make significant progress on. @jerbob92 https://github.com/jerbob92's work is probably the best you can get for now. In fact, while I've stalled on this my own needs have changed - there's a chance I'll be trying to port it into high performance JavaScript instead.

Met vriendelijke groet, Pauline Middelink

marcus-downing commented 7 years ago

I recently learned about a PDF toolkit for Go called UniDoc. It's not free for commercial use, but it does apparently support reading and using PDFs (see this example). Has anybody any experience with it, or comments on what makes it good or bad?

raliste commented 7 years ago

It does have support for reading but lacks a comprehensive toolset for drawing, annotating, text editing, image embedding, etc.

ahall commented 7 years ago

UniDoc version 2.0.0 supports this @raliste.

marcus-downing commented 6 years ago

After years of inactivity, I'm now officially admitting that I won't be able to complete this. It's unfortunate, but my own project has moved on and will be moving away from PDF.

If anybody wants to pick up the work I did on this, go ahead.

jung-kurt commented 6 years ago

Thank you very much, @marcus-downing, for all of your contributions to this project. I hope someone can pick up where you left off.

Good luck!

akumbhani66 commented 5 years ago

Tricky, But somehow able to achieve this thing, May if it helps someone.

package main

import (
    "fmt"
    "os"

    "github.com/unidoc/unidoc/pdf/creator"
    pdf "github.com/unidoc/unidoc/pdf/model"
    "github.com/unidoc/unidoc/pdf/model/fonts"
)

func main() {

    err := addTextToPdf("./input.pdf", "./output.pdf", "textToBeinsert", 1 /* page no */, 99 /* X position*/, 99 /* Y position */)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
    }

    fmt.Printf("Complete, see output file: %s\n", "./output.pdf")
}

func addTextToPdf(inputPath string, outputPath string, text string, pageNum int, xPos float64, yPos float64) error {
    f, err := os.Open(inputPath)
    if err != nil {
        return err
    }
    defer f.Close()

    pdfReader, err := pdf.NewPdfReader(f)
    if err != nil {
        return err
    }

    numPages, err := pdfReader.GetNumPages()
    if err != nil {
        return err
    }

    c := creator.New()

    // Load the pages.
    for i := 0; i < numPages; i++ {
        page, err := pdfReader.GetPage(i + 1)
        if err != nil {
            return err
        }

        err = c.AddPage(page)
        if err != nil {
            return err
        }

        if i == pageNum || pageNum == -1 {
            p := creator.NewParagraph(text)
            p.SetFont(fonts.NewFontTimesBold())
            p.SetPos(xPos, yPos)

            _ = c.Draw(p)
        }

    }

    err = c.WriteToFile(outputPath)
    return err
}

jung-kurt commented 5 years ago

Thanks for sharing, @akumbhani66. This is a good technique to know about.

jerbob92 commented 5 years ago

Thanks @akumbhani66, very cool! For anyone reading this and just copy pasting it, please note that UniDoc is only free for Open Source projects.

akumbhani66 commented 5 years ago

@jerbob92 Yeah, Thanks for mentioning it. I should mention this.

phpdave11 commented 5 years ago

I have created a go package called gofpdi (Go Free PDF Document Importer) that allows you to import PDFs.

So far I have integrated it with gopdf, but it shouldn’t be too hard to integrate it with gofpdf. Here’s an example of how it works with gopdf: https://github.com/signintech/gopdf/issues/109

phpdave11 commented 5 years ago

Originally posted by @jung-kurt in https://github.com/signintech/gopdf/issues/109#issuecomment-489337529:

Awesome! This feature has been requested more than once. I look forward to studying it.

In order to integrate it with gofpdf which depends only on the Go standard library, the package that imports gofpdi (with the ImportPage() method) will go into the contrib directory (version 1) and the gofpdfcontrib repository (version 2).

Hi @jung-kurt - I've added a few functions to the main library and I put the code that imports gofpdi in the gofpdfcontrib repository. Here's an example of how you can import an existing PDF into gofpdf with gofpdi:

package main

import (
    "github.com/jung-kurt/gofpdf/v2"
    "github.com/phpdave11/gofpdfcontrib/gofpdi"
    "io"
    "net/http"
    "os"
)

func main() {
    var err error

    pdf := gofpdf.New("P", "mm", "A4", "")

    // Download a PDF
    fileUrl := "https://tcpdf.org/files/examples/example_026.pdf"
    if err = DownloadFile("example-pdf.pdf", fileUrl); err != nil {
        panic(err)
    }

    // Import example-pdf.pdf with gofpdi free pdf document importer
    tpl1 := gofpdi.ImportPage(pdf, "example-pdf.pdf", 1, "/MediaBox")

    pdf.AddPage()

    pdf.SetFillColor(200, 700, 220)
    pdf.Rect(20, 50, 150, 215, "F")

    // Draw imported template onto page
    gofpdi.UseImportedTemplate(pdf, tpl1, 20, 50, 150, 0)

    pdf.SetFont("Helvetica", "", 20)
    pdf.Cell(0, 0, "Import existing PDF into gofpdf document with gofpdi")

    err = pdf.OutputFileAndClose("example.pdf")
    if err != nil {
        panic(err)
    }
}

// DownloadFile will download a url to a local file. It's efficient because it will
// write as it downloads and not load the whole file into memory.
func DownloadFile(filepath string, url string) error {
    // Get the data
    resp, err := http.Get(url)
    if err != nil {
        return err
    }
    defer resp.Body.Close()

    // Create the file
    out, err := os.Create(filepath)
    if err != nil {
        return err
    }
    defer out.Close()

    // Write the body to file
    _, err = io.Copy(out, resp.Body)
    return err
}

Generated PDF: example.pdf

Screenshot of PDF: example

zolotokrylin commented 5 years ago

+1 👍 . This feature is a must nowadays:)

phpdave11 commented 5 years ago

@zolotokrylin this feature was merged into master. To use it, import the gofpdi contrib pagkage:

import "github.com/jung-kurt/gofpdf/contrib/gofpdi"

Then import a page from a PDF:

tpl1 := gofpdi.ImportPage(fpdf, "file.pdf", 1, "/MediaBox")

Draw pdf onto page:

gofpdi.UseImportedTemplate(fpdf, tpl1, 20, 50, 150, 0)

For width and height (the last 2 arguments) you can set one of those to 0 to automatically calculate the correct width or height based on the aspect ratio of the imported page.

denept commented 5 years ago

@zolotokrylin this feature was merged into master. To use it, import the gofpdi contrib pagkage:
import "github.com/jung-kurt/gofpdf/contrib/gofpdi"
Then import a page from a PDF:
tpl1 := gofpdi.ImportPage(fpdf, "file.pdf", 1, "/MediaBox")
Draw pdf onto page:
gofpdi.UseImportedTemplate(fpdf, tpl1, 20, 50, 150, 0)
For width and height (the last 2 arguments) you can set one of those to 0 to automatically calculate the correct width or height based on the aspect ratio of the imported page. 请问如何获取导入pdf文件的页面高、宽，和页数等其它信息？

gbenroscience commented 5 years ago

Could I please have a comprehensive example on importing an existing pdf and placing an image at some location on it and saving it to an output file? I cant seem to get this to work

gbenroscience commented 5 years ago

I keep getting:

panic: Failed to get page resources: Page 5 does not exist!!

while importing a pdf file that has 237 pages

mrtsbt commented 5 years ago

https://github.com/phpdave11/gofpdi provides examples for importing and using templates in gofpdf. Additionaly, you might want to take a look at the examples in fpdf_test.go and contrib/gofpdi/gofpdi_test.go

phpdave11 commented 5 years ago

@gbenroscience this likely indicates a bug in the gofpdi parser.

If you send me the PDF that you’re trying to import, I can try and fix the issue.

jung-kurt / gofpdf

Add support to read a pdf file to use it as a template #8