jung-kurt / gofpdf

A PDF document generator with high level support for text, drawing and images
http://godoc.org/github.com/jung-kurt/gofpdf
MIT License
4.31k stars 777 forks source link

incorrect UTF-8 Rendering ("tofu") #250

Closed ajstarks closed 5 years ago

ajstarks commented 5 years ago

Using this program:

package main

import (
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"

    "github.com/jung-kurt/gofpdf"
)

func main() {

    fontdir := os.Getenv("DECKFONTS")
    if len(fontdir) == 0 {
        fmt.Fprintf(os.Stderr, "cannot set the font directory\n")
        os.Exit(1)
    }
    fromFile, err := ioutil.ReadFile("hello.txt")
    if err != nil {
        fmt.Fprintf(os.Stderr, "%v\n", err)
        os.Exit(2)
    }
    pdf := gofpdf.New("P", "pt", "Letter", fontdir)
    pdf.AddPage()
    pdf.AddUTF8Font("deja", "", filepath.Join(fontdir, "DejaVuSansCondensed.ttf"))
    pdf.SetFont("deja", "", 72)
    pdf.Text(100, 300, "Hello, 世界")
    pdf.Text(100, 400, string(fromFile))
    err = pdf.OutputFileAndClose("utf.pdf")
    if err != nil {
        fmt.Fprintf(os.Stderr, "%v\n", err)
        os.Exit(3)
    }
    os.Exit(0)
}

The generated PDF file shows "tofu" (empty rectangle) when run with this command $ DECKFONTS=$GOPATH/src/github.com/jung-kurt/gofpdf/font go run utftest.go

I've also attached a screenshot showing the setup and rendering in mupdf and Chrome.

The contents of "hello.txt" is: Hello, 世界

Screenshot from 2019-05-06 19-54-14

ajstarks commented 5 years ago

Attached is the generated PDF utf.pdf

jung-kurt commented 5 years ago

I'm not sure about the "tofu" problem. When I call pdf.Text() with that string literal it renders correctly, so you may want to hex dump the contents of hello.txt file after it is read to see what is going on.

The Deja Vu fonts do not include CJK code points, so that is the reason the "世界" string does not render. The following works, but the font doesn't seem to support ASCII characters. I am not sure how the font was installed on my system (Ubuntu 18.04).

package main

import (
  "fmt"
  "github.com/jung-kurt/gofpdf"
  "os"
)

func main() {
  pdf := gofpdf.New("P", "pt", "Letter", "")
  pdf.AddPage()
  pdf.AddUTF8Font("deja", "", "/usr/share/fonts/truetype/droid/DroidSansFallbackFull.ttf")
  pdf.SetFont("deja", "", 18)
  pdf.Text(100, 300, "世界")
  err := pdf.OutputFileAndClose("utf.pdf")
  if err != nil {
    fmt.Fprintf(os.Stderr, "%v\n", err)
    os.Exit(3)
  }
  os.Exit(0)
}

I think we'll need to include an explanation of this in the README.

ajstarks commented 5 years ago

The same happens with the Noto (no tofu), font which is specifically designed to have those code points. See: https://www.google.com/get/noto/

Also, the terminal configured with a font is able to render the file correctly, but the specifying the same font results in tofu.

here is the hex dump of hello.txt: $ od -c hello.txt 0000000 H e l l o , 344 270 226 347 225 214 0000015 $ od -b hello.txt 0000000 110 145 154 154 157 054 040 344 270 226 347 225 214 0000015 $ od -x hello.txt 0000000 6548 6c6c 2c6f e420 96b8 95e7 008c 0000015 $

jung-kurt commented 5 years ago

The same happens with the Noto (no tofu), font which is specifically designed to have those code points.

I downloaded Noto-SC (世界 shows up properly in the preview) but was unable to convert the OTF files to TTF using FontForge. Were you able to get a TTF directly?

$ od -x hello.txt
0000000 6548 6c6c 2c6f e420 96b8 95e7 008c

This is a valid utf-8 string according to this utf-8 tool, so I think it doesn't render for the same reason 世界 doesn't.

ajstarks commented 5 years ago

you can get the Noto TTF: here https://www.fontsquirrel.com/fonts/noto-sans

Another font to try is the Go fonts: If you just want the TTF files, run

git clone https://go.googlesource.com/image and copy them from the subsequent image/font/gofont/ttfs directory.

jung-kurt commented 5 years ago

This font worked for me:

https://github.com/jsntn/webfonts/blob/master/NotoSansSC-Regular.ttf

Note the "SC" suffix which denote simplified Chinese. It properly rendered the ASCII characters, 世界 and the contents of your hello.txt file.

ajstarks commented 5 years ago

Great, I can reproduce. Does the Go Font work for you?

jung-kurt commented 5 years ago

Does the Go Font work for you?

No. One clue is the size of the files.

      140308 ./Go-Regular.ttf
     2171336 ./NotoSansSC-Regular.ttf
ajstarks commented 5 years ago

ok to close