tecnickcom / TCPDF

Official clone of PHP library to generate PDF documents and barcodes
https://tcpdf.org
Other
4.18k stars 1.51k forks source link

Feature proposal: optimize PDFs by replacing %F coordinates #628

Open lserni opened 1 year ago

lserni commented 1 year ago

Inspecting the uncompressed PDF code, a great many cases where coordinates are specified with unnecessary precision can be found. This is absolutely normal for all PDFs (e.g. 17.5 specified as 17.500000, or 0 specified as 0.000000). Nonetheless, in very detailed PDFs with typeset tables and no images, replacing with a more compact expression and even truncating floats (e.g. 17.4996234 becoming 17.5) yields files size savings of up to 25% of the compressed file (verified by myself) with no noticeable.

At the unavoidable cost of a slower PDF generation, this can be achieved by replacing the sprintf() calls in the code that have a %F format specifier with calls to $this->ksprintf, this being a function that recognizes and "shaves" floats by leveraging the fact that only the %F format is ever used in the TCPDF code.

This is in no way a replacement/wrapper for the actual sprintf function.

If anyone wants to experiment:

/**
 * In most cases, floating point arguments in PDF have higher precision than necessary, and are often
 * unnecessarily specified - e.g. "Tw 17.500000" instead of "Tw 17.5". This function replaces it,
 * keeping things like "Text (Price: 17.500000)" untouched.
 * This usually decreases the size of compressed PDFs by 5-25% depending on the contents.
 *
 * Possibly, the precision could even be user-selectable ($pdf->setFloatPrecision(int $digits = 3))
 *
 * @param ...$args
 *
 * @return string
 */
protected function ksprintf(...$args): string {
    $format = array_shift($args);
    if (preg_match_all('#%(.)#', $format, $gregs, PREG_OFFSET_CAPTURE | PREG_PATTERN_ORDER)) {
        foreach ($gregs[1] as $i => [$par, $off]) {
            if ('F' === $par) {
                $format[$off] = 's';
                $args[$i] = preg_replace(['#\.([^0]*)0+$#', '#\.$#'], ['\1', ''], round($args[$i], 3));
            }
        }
    }
    return sprintf($format, ...$args);
}