spipu / html2pdf

OFFICIAL PROJECT | HTML to PDF converter written in PHP
http://html2pdf.fr/en/default
Open Software License 3.0
1.68k stars 750 forks source link

Performance problem since html2pdf 4.0.3 => 5.0.1 #230

Open michauk opened 7 years ago

michauk commented 7 years ago

Hello there, I upgraded from html2pdf 4.0.3 to latest 5.* some days ago. On a Debian standard server (php 5.6...), in order to prepare the Debian upgrade (from Jessie to Stretch, I'm a bit late). I think I read somewhere that I had to upgrade html2pdf to 5 to get it running on php 7. In any case, it surely is a good thing to upgrade this tool along with the distro.

I encountered a big performance problem I didn't have before.

Mainly, my PDF is a [sometimes] big table (many rows, a few columns) with basic stuff in it (amount, date, reference...) With html2pdf 5, I need 1.5-2 seconds to generate that table (1 page) for only ~70 lines. Time increases exponentially with the number of lines (the server is a big one with, no performance problem, tuning/monitoring it myself blah blah blah).

I started to debug this and reduced a copy of my code to just this table with no PHP/mysql processing, I just copied/pasted my table content. I activated setDebugMode to check.

Maybe I missed something simple (or my html code is dirty?), but I can't find why. I tried to remove every style attribute, it's a bit faster, but still slow and exponentially slower. If anyone has any idea, I'd appreciate.

Here you can find an example with only this table content (+ a footer) with the style attributes (I put it on all TD which is maybe a bad idea. And the same without any style nowhere. You can copy/paste some TR/TD some generate hundreds of lines and check the processing time.

examples.zip

Regards, Jacques M.

michauk commented 7 years ago

OK I modified "exemple07a.php" in /examples/res/, to increase the number of lines. Same thing => time becomes cray. I tried to add tables (with one line) instead of one table with many rows, same crazy processing time.

michauk commented 7 years ago

OK I switched back to v4.03 and won't change it unless I'm forced to. I tested html2pdf 4.03 with PHP7, it's working. With the 4.03, I can create a table with 500 lines in approx 13 seconds. I can create 10 tables, each containing 500 lines, in approx 3 minutes, provided you increase the php time limit + memory to 512MB. And with PHP7, it just took 13 seconds. Now I just hope the 4.03 will still work for a long time...

spipu commented 6 years ago

i wil check this on the last version, thanks for the report

layebaARD commented 6 years ago

Hi I would like to generate a pdf but I do not understand the latest version. Could I have a link that would allow me to download version 4.03

jomofcw commented 6 years ago

Hi there,

I encounter the same problem, as explicitly explained by @michauk . Is there any plan to fix it, please ? I'm downgrading to v4.*, waiting for the fix.

Thanks for your work.

jomofcw commented 6 years ago

Hello,

Sorry to spam about it, but it's an issue that need fix, really. V5 seems to be great, but while this issue exists, it make it unusable, sadly. I can help if test cases are needed.

Thanks for your work.

verbunden commented 6 years ago

We had a similar problem. For us it was because the call of "TCPDF::_destroy" took too long. The reason for this were 2 million files in the temporary folder searched by "TCPDF::_destroy" due to a failure in the session cleaning process. Once the folder was cleaned everything worked fine.

PHP v7.0.29-1~dotdeb+8.1 (Debian 3.16.51-3+deb8u1 x86_64)

spipu commented 6 years ago

Hi, i juste create a performance tool, to search the pb. you can find it on the performance branch : https://github.com/spipu/html2pdf/blob/performance/performance/full.php

i need some metrics on different environments. Who can launch it ?

spipu commented 6 years ago

My metrics: 1|66|11538 5|115|11539 10|172|11541 25|353|11546 50|740|11552 75|1101|12047 100|1503|12504 250|4589|15622 500|23645|21000 750|68102|26003 1000|124105|31748

jomofcw commented 6 years ago

Hello,

To avoid any waste of time, can you provide some deploiement instructions, please ? I'll test it asap.

spipu commented 6 years ago

sure

git clone https://github.com/spipu/html2pdf.git -b performance  html2pdf_performance
cd html2pdf_performance
composer install --no-dev
cd performance
php ./full.php
jomofcw commented 6 years ago

Thanks !

Sorry, another problem :/.

root@myWebServer:/myPath/html2pdf-performance# git clone git@github.com:spipu/html2pdf.git -b performance html2pdf_performance
Clonage dans 'html2pdf_performance'...
Warning: Permanently added the RSA host key for IP address '192.30.253.113' to the list of known hosts.
Permission denied (publickey).
fatal: Impossilble de lire le dépôt distant.

Veuillez vérifier que vous avez les droits d'accès
et que le dépôt existe.
Airthee commented 6 years ago

Hi, this is my metrics :

1|2249|11541
5|701|11542
10|904|11543
25|1573|11548
50|2718|11557
75|3905|12052
100|5256|12509
250|14134|15627
500|52411|21004
750|129720|26008
1000|244713|31752

I'm on Ubuntu for Windows (WSL), I run the script with PHP 5.6 (production version).

spipu commented 6 years ago

@jomofcw i fix the instructions

jomofcw commented 6 years ago

It's OK, thanks ^^.

So my metrics (using PHP7 on Debian environment with near default configuration) : 1|87|10493 5|107|10494 10|133|10495 25|217|10500 50|376|10504 75|556|10747 100|768|11204 250|2959|14322 500|42322|19699 750|135246|24702 1000|267831|30447

KevinF-tech commented 6 years ago

Hello,

My metrics (Debian 9.5 - PHP 7.0.30):

1|131|10168
5|379|10169
10|691|10171
25|1649|10176
50|3531|10180
75|5125|10480
100|7302|10953
250|18326|14179
500|45333|19721
750|86438|24890
1000|145040|30799
Lorendex commented 6 years ago

PHP7.0 Ubuntu 16.04 1|76|10506 5|90|10507 10|116|10508 25|194|10513 50|322|10517 75|481|10761 100|679|11218 250|2118|14336 500|13066|19714 750|32667|24717 1000|62140|30462

PHP7.2 Ubuntu 16.04 1|70|10506 5|97|10507 10|116|10508 25|194|10513 50|343|10517 75|516|10761 100|685|11218 250|2136|14336 500|13033|19714 750|33688|24717 1000|69630|30462

PHP7.1.6 Windows 10 1|94|9165 5|119|9166 10|156|9167 25|271|9172 50|474|9327 75|708|9711 100|979|10056 250|3250|12372 500|23620|16538 750|69842|20382 1000|144159|24846

will2877 commented 5 years ago

Are there any News to this Issue? I seem to have the same Problem, but dont have shell access to the Host. It works fine on my local xampp installation but becomes very slow once I upload to my hosted server.

Local PHP: 7.2.9 Remote PHP: 7.2.11

Thanks in advance!

jacobdo2 commented 5 years ago

Any news on this issue?

S-K-P commented 5 years ago

With html2pdf 5.2.1

Same problem as @michauk, but like @DiisMami it works fine on local installation with:

It doesn't work fine on web host with:

My metrics on local: 1|90|10513 5|87|10514 10|109|10515 25|174|10520 50|301|10524 75|449|10786 100|614|11243 250|1937|14361 500|13403|19739 750|38670|24742 1000|82348|30487

My metrics on web host: 1|149|14092 5|214|14242 10|314|14404 25|650|14951 50|1289|15878 75|2095|16667 100|3012|17664 250|18764|23276 500|139009|32879 750|415472|43881 1000|took to much time

PS: sorry if my english is bad

meritel commented 5 years ago

Hello, After many tests, i found a real performance lack with your way of cloning object, in function createSubHTML().

// clone the sub object
 $subHtml = clone self::$_subobj;

Each function which is calling this createSubHTML() put the return of createSubHTML() in a variable $sub, and often destroy it (the $sub) after having used it. (your function _destroySubHTML doesn't work here, so I made the unset($sub) manualy.)

Each time createSubHTML() is called, I log a lack between 10ms and 20ms depending on our server charge. With my HTML file, i have counted 843 calls to createSubHTML()/_destroySubHTML(), which takes 9 947.4ms (~9s)

The complete generation of my PDF (1 page, some html tables nested) take 11 202.8ms (~11sec)

(sorry for my english, i'm FR)

meritel commented 5 years ago

Hello,

Finaly found the way to solve the problem. The problem is not that much the fact that you clone the whole object, but the destroy methods of each class linked to this object, when it is remove by the garbage collector. When you make the clone, and send it by reference to the function that called the createSubHTML(), you put it in $sub variable.. When this variable is destroyed, the clone is also unset. And the magical methods destroy of all the classes are called. The one in tcpdf.php is really slow. By unactivating the glob block (see below), i pass from ~11 000 ms => ~800ms (in the pdf i try to make, i'm passing 843x in this code, so 843x it search in the whole tmp_dir some files to cleanup, using glob function [regexp search]. We have on our server something that already clean up this dir, so i've desactivated this part of code)

public function _destroy($destroyall=false, $preserve_objcopy=false) {
        [...]
        /* 
        //This part is slowing down html2PDF
        if ($destroyall AND !$preserve_objcopy) {
            // remove all temporary files
            $tmpfiles = glob(K_PATH_CACHE.'__tcpdf_'.$this->pdf->file_id.'_*');
            if (!empty($tmpfiles)) {
                array_map('unlink', $tmpfiles);
            }
        }
        */
        [...]
    }
meritel commented 5 years ago

By the way, this file cleaning should be executed ONCE, at the real end of the script using HTML2PDF. So i've surcharged MyPDF.php, with __destruct and _destroy functions, with the glob removing commented (see my precedent post).

In Html2Pdf.php, i added the tmp_file cleaning in the magical __destruct() of Html2Pdf, like this :

    static protected $_tmpFilesAreCleaned   = false;    // flag : file cleaning is done

    public function __destruct() {
        if($this->_isSubPart || self::$_tmpFilesAreCleaned) return; 
        // remove all temporary files
        $tmpfiles = glob(K_PATH_CACHE.'__tcpdf_*');
        if (!empty($tmpfiles)) {
            self::$_tmpFilesAreCleaned = true;
            array_map('unlink', $tmpfiles);
        }
    }

Hope this will help ;)

KevinF-tech commented 5 years ago

Thanks it works good! Could you open a PR?

meritel commented 5 years ago

Done. Instead of modifying tcpd.php, i changed MyPDF.php, surcharging tcpdf's __destruct magic method.

meritel commented 5 years ago

Be carreful with the code above, if you want to execute more than 1 instance of html2pdf, one instance could delete files of the other one while the 2nde instance is not yet finished. I'll modify my PR in this way soon.

meritel commented 5 years ago

modified. PR done.

Tofandel commented 5 years ago

456

Tofandel commented 5 years ago

It's mostly a tcpdf issue then I'd recommend making a PR there

meritel commented 5 years ago

I don't think it's a tcpdf issue... Html2pdf makes lot of recursive instantiation of tcpdf class, that's the real problem... And tcpdf has to clean up its variables once the instance is destroyed.. tcpdf has not been made in that way of thousands recursive instantiations/destructions....

Tofandel commented 5 years ago

Okay I got what's happening, the ID of the cloned object is the same so it's trying to cleanup the same ID over and over again, it's still an issue in the scope of tcpdf but that can have a hotfix here as well

Tofandel commented 5 years ago

I made a PR on Tcpdf, you can hotfix this on Mypdf class on your PR and revert the rest

jomofcw commented 5 years ago

Any news about this, please ? This is the only thing that avoid me from using html2pdf v5 :'(.

Tofandel commented 5 years ago

I'm personally switching to https://github.com/mpdf/mpdf and advise everybody here to do the same, it's very easy to switch from one to another, This lib is old with no activity, the support is terrible and don't get me started on the code quality.

citystrolch commented 5 years ago

I have tested meritel's core hack and it brings a massive performance boost indeed, thank you. I suggest to implement this, however, I know that the task is probably first of all with tcpdf, I'll try and suggest it there, too.

Tofandel commented 5 years ago

FYI: My tcpdf PR has been merged

citystrolch commented 5 years ago

Thanks @Tofandel - that means future installations of html2pdf as well as tcpdf should have it included right? (sorry to ask like a beginner, but in terms of github I am...)

Tofandel commented 5 years ago

It was also included in the new release, it means that if you run composer update The version of tcpdf will be updated and you will get the performance improvement

ggedde commented 5 years ago

I am having the same issue with V5.2.1. I have reverted back to 4.03. 33 page document with 17 tables with about 40 items in each table 5.2.1 = 22 seconds 4.03 = 6 seconds

I will have to check out MPDF someday, but for now 4.03 is working good, I just really wanted to use the end_last_page tag which is not available on 4.03, but not a huge deal.

michauk commented 5 years ago

Hey, 2 years are gone since my 1st message, and it's still a problem Maybe I'll consider moving to something like "onlyoffice docbuilder". It seems it could replace both php->pdf and php->xls tools. For the moment this good ol' 4.03 release (actually I moved to 4.6.1 to be able to use different php libs with composer) is still doing the job. Maybe I had a warning issue on "Countable" object when I upgraded to php 7.3 (debian buster). So I modified the code a bit. I can't remember if it's in this lib or another. Regards,

ggedde commented 5 years ago

Yeah, 4.6.1 was better but still not as fast as 4.03 5.2.1 = 22 seconds 4.6.1 = 9 seconds 4.03 = 6 seconds

Yeah, I had to fix the countable issue too.

Olofu commented 3 years ago

My Metrics on shared web hosting

cPanel Version | 86.0 (build 30) PHP Version 7.4 Architecture | x86_64 Operating System | linux

1|62|10495 5|79|10496 10|101|10497 25|174|10502 50|308|10506 75|452|10514 100|620|10960 250|2019|14076 500|5791|19450 750|11295|24449 1000|19189|30191

Albob commented 2 years ago

Hello and thanks for the lib. I also find it quite slow. Here are my performance measure for html2pdf v5.2.4, if this helps:

Steps: 1, 5, 10, 25, 50, 75, 100, 250, 500, 750, 1000
Try by Steps: 10

1|212|9412
5|198|9413
10|284|9414
25|547|9419
50|979|9423
75|1506|9644
100|2090|10010
250|5622|12452
500|14838|16823
750|29104|20876
1000|54589|25549

My CPU is an Intel Core i7-1065G7 image

php --version
PHP 7.4.1 (cli) (built: Jan 20 2020 22:21:57) ( ZTS Visual C++ 2017 x86 )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Xdebug v2.9.1, Copyright (c) 2002-2020, by Derick Rethans