jung-kurt / gofpdf

A PDF document generator with high level support for text, drawing and images
http://godoc.org/github.com/jung-kurt/gofpdf
MIT License
4.34k stars 787 forks source link

Speeding up image rendering #170

Open ajstarks opened 6 years ago

ajstarks commented 6 years ago

In my experience, image rendering is the largest bottleneck. (see https://github.com/ajstarks/go-info-displays for an example of a deck rendered with pdfdeck that contains several large (4224x2376) images.

Any clues to how image rendering can be sped up?

jung-kurt commented 6 years ago

Do the images need to be that large in the finished document? You could try incrementally decreasing the size to find the smallest acceptable extent.

ajstarks commented 6 years ago

I'm scaling the images (which is probably increasing the run time) so that I can have a single set and re-scale if my canvas changes.

I suppose I can rescale the image beforehand, but it would be nice to know if there are other knobs to turn besides using smaller images.

jung-kurt commented 6 years ago

The full contents of a JPG file are loaded into memory and kept there for later streaming into the PDF. It would be interesting to compare a profile of this scheme with that of an alternative approach in which the JPG is not stored but simply read twice, the first time for image information and the second for data transfer into the document.

inovacap commented 6 years ago

Hello jung-kurt,

I found your post concerning bulleted/numbered lists in fpdf here: https://www.bountysource.com/issues/54777549-help-create-nested-list

In it you refer to a script here: http://www.fpdf.org/en/script/script56.php

That script, as written, results in repeating the sample text. It gives the format of a list, but the text is always the same

This code: ` $sample_text = 'This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray().';

//Test1 $test1 = array(); $test1['bullet'] = chr(149); $test1['margin'] = ' '; $test1['indent'] = 0; $test1['spacer'] = 0; $test1['text'] = array(); for ($i=0; $i<5; $i++) { $test1['text'][$i] = $sample_text; } $pdf->SetX(10); $pdf->MultiCellBltArray($column_width-$pdf->x,6,$test1); $pdf->Ln(10);`

Results in

This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray(). This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray(). This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray(). This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray(). This is bulleted text. The text is indented and the bullet appears at the first line only. This list is built with a single call to MultiCellBltArray().

I have been trying to figure out how to modify the code so I can drop in delimited text so the result will be a proper list (with the bullets/number/symbol)

In other words, what I want would look something like this:

`$sample_text = 'Red,Blue,Yellow';

//Test1 $test1 = array(); $test1['bullet'] = chr(149); $test1['margin'] = ' '; $test1['indent'] = 0; $test1['spacer'] = 0; $test1['text'] = array(); for ($i=0; $i<5; $i++) { $test1['text'][$i] = $sample_text; } $pdf->SetX(10); $pdf->MultiCellBltArray($column_width-$pdf->x,6,$test1); $pdf->Ln(10);`

Where the delimiter is the comma in the text, and the result will be

  1. Red
  2. Blue
  3. Yellow

The code in the link you provided seems to be 'broken', at least I can't figure out how to create the appropriate array to get the right result.

If you have any ideas how to 'fix' the code, I'd be interested to know

you can email me at r@inovacapitalservices.com

jung-kurt commented 6 years ago

My guess is you will want to use a PHP function like explode to break your string into an array of comma-delimited substrings. From there, you can assign them in a loop to $test1['text'][0], $test1['text'][1], etc.

You will get a better response if you direct your FPDF/PHP questions to the forum at the FPDF site. It is helpful and active.

inovacap commented 6 years ago

I appreciate the reply! I did puzzle over the explode function long before I posted here.

The line in question is $test1['text'][$i] = $sample_text;

Somehow "$i" is supposed to represent "[0]", "[1]", "[2]", etc, but from what?

If all there is is one text variable "My dog has fleas", with nowhere in the code that some how can account for a delimiter in that sample text (like "My,dog,has,fleas"), when "$test1['text'][$i]" draws (inexplicably to me) from "$sample_text" the only thing that is going to happen is $sample text will just keep repeating itself - like it does.

But if $sample_text were converted into a delimited array, and then we have "$test1['text'][$i] = $array;" that might work.

and fwiw I did post over at the site. It is the very last question posted.

Thanks again!

ajstarks commented 6 years ago

The read twice option may be faster. The standard image package has fast methods for getting image info like width and height. What info does gofpdf need?

jung-kurt commented 6 years ago

What info does gofpdf need?

A fair number of fields are collected (more than is provided by, for example, jpeg.DecodeConfig()), but it may be that only the width, height, color model and DPI are used internally until the image is emitted at the end of the document production. Unfortunately, the GetImageInfo() method gives developers access to the image fields, so we can't restrict the stored information without breaking compatibility. Maybe a viable option would be to set a "two-read" mode (off by default) in which only the information provided by the standard library is stored, and when the image content is required a second read is made. The advantage of this would be that image content could be streamed into the PDF without any persistent buffering.

ajstarks commented 6 years ago

More info as a baseline: the deck in question has 126 slides with 133 images for a total of 47,690,287 bytes. The time to render is around 5 seconds on a 2.3 GHz MacBook Pro