jung-kurt / gofpdf

A PDF document generator with high level support for text, drawing and images
http://godoc.org/github.com/jung-kurt/gofpdf
MIT License
4.33k stars 782 forks source link

AddUTF8FromBytes modifies its argument utf8bytes resulting in a panic if you generate two PDFs wih the "same" font bytes. #316

Closed vdobler closed 4 years ago

vdobler commented 5 years ago

I've encountered a strange problem: I can generate a PDF with embedded UTF-8 font without any problems, but just once. Generating the same PDF once more results in a panic because the input utf8bytes to AddUTF8FontFromBytes is modified.

$ ./fpdfpanic
Making PDF
--> OK
Making PDF
panic: runtime error: index out of range [20] with length 12

goroutine 1 [running]:
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).getSymbols(0xc0000cc2a0, 0x14, 0xc0000c6e28, 0xc00006c210, 0xc00006c1b0, 0xc00006a120, 0xb, 0xb, 0xc0000c6e28, 0xc00006c210, ...)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:793 +0x522
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).parseSymbols(0xc0000cc2a0, 0xc0000a36e0, 0xb, 0xc00006c2a0, 0x3ff0000000000000, 0x0, 0x0, 0x0)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:563 +0x541
github.com/jung-kurt/gofpdf/v2.(*utf8FontFile).GenerateСutFont(0xc0000cc2a0, 0xc0000a36e0, 0x0, 0xc000014a3a, 0x3)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/utf8fontfile.go:672 +0x3eb
github.com/jung-kurt/gofpdf/v2.(*Fpdf).putfonts(0xc0002ce800)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4036 +0xb78
github.com/jung-kurt/gofpdf/v2.(*Fpdf).putresources(0xc0002ce800)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4511 +0x75
github.com/jung-kurt/gofpdf/v2.(*Fpdf).enddoc(0xc0002ce800)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:4698 +0x121
github.com/jung-kurt/gofpdf/v2.(*Fpdf).Close(0xc0002ce800)
    /Users/m0252720/go/pkg/mod/github.com/jung-kurt/gofpdf/v2@v2.13.3/fpdf.go:688 +0xa1
main.makePDF(0xc00006c150)
    /Users/m0252720/go/src/tmp/fpdfpanic/main.go:28 +0x215
main.main()
    /Users/m0252720/go/src/tmp/fpdfpanic/main.go:17 +0xd5

Here is (stripped) the code to reproduce above error:

package main

import (
    "bytes"
    "encoding/base64"
    "fmt"

    "github.com/jung-kurt/gofpdf/v2"
)

var fontBytes []byte

func main() {
    fontBytes, _ = base64.StdEncoding.DecodeString(dvscData)
    makePDF()
    makePDF()
}

func makePDF() *bytes.Buffer {
    fmt.Println("Making PDF")
    pdf := gofpdf.New("P", "mm", "A4", "")
    pdf.AddUTF8FontFromBytes("HNC", "", fontBytes)
    pdf.AddPage()
    pdf.SetXY(20, 20)
    pdf.SetFont("HNC", "", 12)
    pdf.MultiCell(0, 15, "abcABC1234", "", "L", false)
    pdf.Close()

    out := &bytes.Buffer{}
    pdf.Output(out)
    fmt.Println("--> OK")
    return out
}

// DejaVuSansCondensed.ttf in base64 encoding. (trimmed).
var dvscData = `AAEAAAAUAQAAB ..... `

The problem does not happen if the font is loaded from disk, e.g. with pdf.AddUTF8Font("HNC", "", "..../github.com/jung-kurt/gofpdf/font/DejaVuSansCondensed.ttf")

The panic does not happen on the first rendering but on all subsequent renderings (not show in demo code above).

The reason is that somehow AddUTF8FontFromBytes modifies its input utf8bytes. Makeing a copy beforehand avoids the problem and renders all PDFs without problems.

         // Working code:
    pdf := gofpdf.New("P", "mm", "A4", "")
    cpy := make([]byte, len(fontBytes))
    copy(cpy, fontBytes)
    pdf.AddUTF8FontFromBytes("HNC", "", cpy)

(Maybe this modification if the utf8bytes argument is intended but then it should be documented.)

The delta is not big, here the output of github.com/google/go-cmp/cmp.Diff'ing utf8bytes before and after:

[]uint8{
    ... // 55292 identical bytes
    0x00, 0x9c, 0x04, 0xe1, 0x02, 0x66, 0x00, 0x8f, 0x01, 0x8d, 0x02, 0xf6, 0x00, 0xcd, 0x03, 0x44, //  |.....f.........D|
    0x00, 0x29, 0x00, 0x66, 0x04, 0xee, 0x00, 0x73, 0x00, 0x00, 0x14, 0x00, 0x00, 0x96, 0x00, 0x00, //  |.).f...s........|
-   0xb7,                                                                                           // -|.|
+   0x00,                                                                                           // +|.|
    0x07, 0x06, 0x05, 0x04, 0x03, 0x02, 0x01, 0x00, 0x2c, 0x20, 0x10, 0xb0, 0x02, 0x25, 0x49, 0x64, //  |........, ...%Id|
    0xb0, 0x40, 0x51, 0x58, 0x20, 0xc8, 0x59, 0x21, 0x2d, 0x2c, 0xb0, 0x02, 0x25, 0x49, 0x64, 0xb0, //  |.@QX .Y!-,..%Id.|
    ... // 120 identical bytes
    0x25, 0x49, 0x60, 0xb0, 0x20, 0x63, 0x68, 0x20, 0x8a, 0x10, 0x8a, 0x23, 0x3a, 0x8a, 0x10, 0x65, //  |%I`. ch ...#:..e|
    0x3a, 0x2d, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00, 0x08, 0x00, 0x02, 0xff, 0xff, 0x00, 0x03, 0x00, //  |:-..............|
-   0x02, 0x00,                                                                                     // -|..|
+   0x00, 0x00,                                                                                     // +|..|
    0x5b, 0xfe, 0x96, 0x03, 0xf6, 0x05, 0xa4, 0x00, 0x03, 0x00, 0x07, 0x00, 0x00, 0x13, 0x11, 0x21, //  |[..............!|
    0x11, 0x25, 0x21, 0x11, 0x21, 0x5c, 0x03, 0x99, 0xfc, 0xce, 0x02, 0xcc, 0xfd, 0x34, 0xfe, 0x96, //  |.%!.!\.......4..|
    ... // 479001 identical bytes
    0xe8, 0xec, 0xbc, 0xb1, 0xc6, 0x15, 0xb8, 0xf8, 0xf8, 0x7d, 0xec, 0xbc, 0xb1, 0xc6, 0x15, 0xb8, //  |.........}......|
    0xf7, 0xfc, 0xe8, 0xe3, 0xe3, 0x00, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x02, 0x5e, 0xb8, //  |..............^.|
-   0x5b, 0x12, 0x3e, 0xed,                                                                         // -|[.>.|
+   0x00, 0x00, 0x00, 0x00,                                                                         // +|....|
    0x5f, 0x0f, 0x3c, 0xf5, 0x00, 0x1f, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0xd3, 0xc2, 0x2b, 0xf9, //  |_.<...........+.|
    0x00, 0x00, 0x00, 0x00, 0xd3, 0xc2, 0x2b, 0xf9, 0xf8, 0xa7, 0xfc, 0x4c, 0x0c, 0xea, 0x09, 0xdc, //  |......+....L....|
    0x00, 0x20, 0x00, 0x08, 0x00, 0x00, 0x00,                                                       //  |. .....|
-   0x01,                                                                                           // -|.|
    0x00, 0x00, 0x00, 0x00, 0x00,                                                                   //  |.....|
+   0x00,                                                                                           // +|.|
    0x01, 0x00, 0x00, 0x07, 0x6d, 0xfe, 0x1d, 0x00, 0x00, 0x0d, 0x7e, 0xf8, 0xa7, 0xfa, 0xe3, 0x0c, //  |....m.....~.....|
    0xea,                                                                                           //  |.|
-   0x00, 0x01,                                                                                     // -|..|
+   0x00, 0x01, 0x00, 0x00, 0x00,                                                                   // +|.....|
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,                         //  |............|
-   0x00, 0x00, 0x18, 0x5e, 0x04, 0x52,                                                             // -|...^.R|
+   0x0b, 0x00, 0x00,                                                                               // +|...|
    0x00, 0x5b, 0x00, 0x00, 0x00, 0x00, 0x02, 0xaa, 0x00, 0x00, 0x02, 0x49, 0x00, 0x00, 0x02, 0xe2, //  |.[.........I....|
    0x01, 0x16, 0x03, 0x4f, 0x00, 0xb1, 0x06, 0x08, 0x00, 0x8e, 0x04, 0x94, 0x00, 0x99, 0x06, 0xd7, //  |...O............|
    ... // 66316 identical bytes
    0x4b, 0xcc, 0x00, 0x07, 0x4c, 0x50, 0x00, 0x07, 0x4c, 0xf0, 0x00, 0x07, 0x4d, 0x78, 0x00, 0x07, //  |K...LP..L...Mx..|
    0x4e, 0x08, 0x00, 0x07, 0x4e, 0x94, 0x00, 0x07, 0x4e, 0xf4, 0x00, 0x07, 0x4f, 0x54, 0x00, 0x01, //  |N...N...N...OT..|
-   0x00, 0x00, 0x18, 0x6d,                                                                         // -|...m|
+   0x00, 0x00, 0x00, 0x0b,                                                                         // +|....|
    0x03, 0x54, 0x00, 0x2b, 0x00, 0x68, 0x00, 0x0c, 0x00, 0x02, 0x00, 0x10, 0x00, 0x99, 0x00, 0x08, //  |.T.+.h..........|
    0x00, 0x00, 0x04, 0x15, 0x00, 0x00, 0x00, 0x08, 0x00, 0x04, 0x00, 0x00, 0x00, 0x1a, 0x01, 0x3e, //  |...............>|
    ... // 15717 identical bytes
    0x6e, 0x73, 0x00, 0x00, 0x43, 0x00, 0x6f, 0x00, 0x6e, 0x00, 0x64, 0x00, 0x65, 0x00, 0x6e, 0x00, //  |ns..C.o.n.d.e.n.|
    0x73, 0x00, 0x65, 0x00, 0x64, 0x00, 0x00, 0x43, 0x6f, 0x6e, 0x64, 0x65, 0x6e, 0x73, 0x65, 0x64, //  |s.e.d..Condensed|
-   0x00, 0x00, 0x02,                                                                               // -|...|
+   0x00, 0x00, 0x00,                                                                               // +|...|
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xff, 0x7e, 0x00, 0x5a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, //  |.......~.Z......|
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x18, 0x6d, //  |...............m|
    ... // 63402 identical bytes
  }
jung-kurt commented 5 years ago

Thanks for the well-written, concise report and demonstration code, @vdobler. There is no reason why a cached copy of the font should be modified. I will look into this and hope others can provide some insights too.

ArtemKor commented 5 years ago

I fixed it and create pull reguest.

jung-kurt commented 5 years ago

@vdobler -- can you confirm that the problem has been corrected?

vdobler commented 5 years ago

I do not thing that the PR #318 is a proper fix for the underlying problem. Making a copy before modifying the byte slice hides the actual problem: somehow reading the font is not a read-only action.

This type of stopgap solution can be used without this "fix" (actually that's what I'm doing now: make a copy). But I think the actual modification should be fixed instead of hidden behind a copy.

jung-kurt commented 5 years ago

@vdobler -- all points well-taken.

jacobalberty commented 4 years ago

The font gets modified in two places within utf8fontfile.go, line 225 in the splice function. Because you are working with slices the arrays backing them get modified. splice gets called by the calls to insertUint16 at lines 782, 786, and 790. If you simply modify splice to make a copy there before proceeding you need to be aware it will make copies for the call at insertUint16 as line 736 (which might not be a bad idea as it ultimately may modify some of the backing font data, but didn't in my quick tests tonight) and another copy at the calls to splice at lines 261 and 1031.

The second place the data gets modified is at line 1025. data = append(data, []byte{0, 0, 0}...) once again this is a slice backed by an array and you're modifying the underlying array. I've come up with two solutions aptly named badfix and worsefix

https://github.com/jacobalberty/gofpdf/tree/badfix Badfix creates a copy within the GenerateCutFont function by appending the data with an empty byte array, this creates a new backing array that separates those changes entirely from the underlying font data. It might need to be applied to the getTableData call at like 698 as well to prevent the call to insertUint16 at line 736 from modifying the underlying font data but I never encountered that as an issue in my testing.

https://github.com/jacobalberty/gofpdf/tree/worsefix This one takes the brute force method of just copying the slices into new arrays in the splice function and the getMetrics function. It's pretty much guaranteed safe but I'm fairly certain it copies more than the fix in PR #318

I am not sure either of these fixes are truly the correct answer as such I have not opened a PR, though badfix is probably the best one to pursue without major refactoring.

jung-kurt commented 4 years ago

Excellent insights and report, @jacobalberty. I will study your branches. How did you go about figuring how the backing array was modified?

jacobalberty commented 4 years ago

@jung-kurt I modified a report generator I had to hash the font bytes before and after then backed out #318 to confirm that hashing the bytes would show the issue. Then went back to with #318 included and started by looking at the getTableData function, I checked everywhere it was called then looked at where the results went eventually tracing it down to those append() calls. When I saw the splice() function's append I asked in #go-nuts on freenode to get some insight on it.

Just FYI this very situation seems to be why they added 3 index slicing into go 1.2 from the design doc for 3 index slicing The addition of append to the language made this somewhat more important, because append lets programmers overwrite entries between len and cap without realizing it or even mentioning cap.

jacobalberty commented 4 years ago

Ok after thinking about it for a when today. It occurs to me that my worsefix may actually be the ideal fix. Both of those functions have side effects right now in that they (may) modify the you pass into them. The worsefix branch removes, at least some of, those side effects

I'm typing on my phone by email right now so I don't have my original analysis in front of me but worsefix should completely remove all side effects from splice while the other function there was at least one other loop that needs to be fixed to remove side effects (at this time it doesn't introduce harmful side effects but it could later if usage changes). I will probably check in a fix for the rest of that function later this evening. It is less efficient since it will duplicate more data than badfix but it will remove a potential source of bugs later on.

On Wed, Nov 6, 2019, 5:16 AM Kurt Jung notifications@github.com wrote:

Excellent insights and report, @jacobalberty https://github.com/jacobalberty. I will study your branches. How did you go about figuring how the backing array was modified?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jung-kurt/gofpdf/issues/316?email_source=notifications&email_token=AAPDNAR3QPIUEMRDVSQNKJ3QSKRQLA5CNFSM4I4I4C62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDGGC6I#issuecomment-550265209, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPDNAQPE27SJNUWIGOXTHLQSKRQLANCNFSM4I4I4C6Q .

jung-kurt commented 4 years ago

Very impressive analysis, @jacobalberty. Thanks for delving into this and figuring out all of the implications.

jung-kurt commented 4 years ago

Many thanks, @jacobalberty for putting this issue to rest ( 47a2c24121ba649b2194251b4e860658a57db47e).