jung-kurt / gofpdf

A PDF document generator with high level support for text, drawing and images
http://godoc.org/github.com/jung-kurt/gofpdf
MIT License
4.29k stars 772 forks source link

improve the speed of MultiCell #290

Closed hyzgh closed 4 years ago

hyzgh commented 4 years ago

Change-Id: Ieaacbf19acfce1e776eccbfa3bbc030a2ab93d5f

hyzgh commented 4 years ago

I used the MultiCell to generate a pdf, but I found it ran very slow. I had a look with the source code and found that it would convert a string slice to a rune slice during every iteration, which is the cause of the slow running speed. To avoid that wasteful operation, I transform the string slice to a rune slice at one time and then use it in the next. With some test, I found that it actually improves the speed of MultiCell, especially for the big string.

jung-kurt commented 4 years ago

Excellent! Many thanks, @hyzgh, for spotting that and correcting it.

joewestcott commented 4 years ago

I've noticed repeated []rune(s) in the write method also. Should be a simple fix, might even help with #301...

jung-kurt commented 4 years ago

Great observation, @joewestcott. I wonder if it makes sense to split the string into a rune slice once at the beginning of the routine and not have to fuss with isCurrentUTF8 at all.

joewestcott commented 4 years ago

I wonder if it makes sense to split the string into a rune slice once at the beginning of the routine and not have to fuss with isCurrentUTF8 at all.

I'm not certain that would work. I think the isCurrentUTF8 distinction is important because the string given to pdf.write() may contain byte sequences that are not valid UTF-8 sequences. When converting to a rune slice, we're no longer indexing over bytes, but unicode code points. With bytes outside of the ASCII space (128-255) this might cause problems?

What would be nice would be to always pass gofpdf's text methods a UTF-8 string, and let it handle the codepage translation (if required) internally!

jung-kurt commented 4 years ago

I think the isCurrentUTF8 distinction is important because the string given to pdf.write() may contain byte sequences that are not valid UTF-8 sequences.

You are definitely right.

What would be nice would be to always pass gofpdf's text methods a UTF-8 string, and let it handle the codepage translation (if required) internally!

Or maybe it is time to get rid of code pages entirely in version 2 and accept only UTF-8 text.