parallax / jsPDF

Client-side JavaScript PDF generation for everyone.
https://parall.ax/products/jspdf
MIT License
28.97k stars 4.64k forks source link

New html-method #1176

Closed eKoopmans closed 4 years ago

eKoopmans commented 7 years ago

Existing jsPDF plugins

Hi, I've been working on a new html2pdf package that uses html2canvas + jsPDF to convert HTML content to PDF. I know there are already three existing jsPDF plugins for HTML: addHTML, fromHTML, and html2pdf (same name). I don't want to step on any toes - from what I can tell:

New html2pdf package

My html2pdf package takes the same approach as addHTML - I convert to a canvas with html2canvas, split that image up into pages, and attach each image onto its own PDF page. I believe it has some advantages over the jsPDF plugins:

Open issues

That said, I've found 48 open issues on jsPDF that I think html2pdf could resolve, but I don't want to start pushing html2pdf if it conflicts with jsPDF's internal implementations. @MrRio @Flamenco I'd appreciate your feedback!

Flamenco commented 7 years ago

My first reaction would be to:

  1. List the end user's usage needs and issues.
  2. Write an API
  3. Let each vendor implement the API
  4. Let the user choose the implementation they want.

This way each implementation will have a standardized usage and documentation.

The main issues I have seen are related to Pagination, Image Quality, CORS, Scaling, Tables, SVG integration, and Font handling.

eKoopmans commented 7 years ago

Thanks @Flamenco, I appreciate the feedback. I think I could rework my package to have an API that could be used within jsPDF, in the same manner as addHTML.

And thanks for the list of common issues! Here's my status on those:

Flamenco commented 7 years ago

The CORS issue could be solved with a proxy handler that gets the image from a server-side request. I'm sure countless hours have been wasted in users fighting that issue...

There are DATA URI issues that also affect image quality. Also, the user might want to scale image resolution, or suppress them.

For pagination, I am thinking more about the API. E.G. How does the user declare IF and WHERE they want breaks, not HOW the implementation chooses to do so. Many issues are because the user scrapes a site that does not have elements with CSS indicators, or long tables that will not fit.

Flamenco commented 7 years ago

Once a nice API is ready, it will be trivial to wrap the existing implementations in them. They could reuses an abstract pagination and image logic as well.

eKoopmans commented 7 years ago

Hah I may have just the thing for the CORS issue, I'll have to tinker around with it when I have the chance. I'll think about your suggestions for pagination and image quality. Thanks again!

Flamenco commented 7 years ago

A simple image callback that is registered will suffice, but it means the render call will need to be async.

raghbendra2015 commented 7 years ago

Is it fixed now and all features are available in jsPDF or still have some issues .?

Because I am facing these three still:

Pagination: pages break automatically, with margins, and you can add custom page-breaks with the html2pdf__page-break class

Image Quality: I have an open pull request on html2canvas to add custom resolution, which is accessible through html2pdf

Tables, SVG, Fonts: relies on how well html2canvas does each, which is mostly "not bad"

I am using below code:

$(function() {
    $('#download_as_pdf').click(function() {
        var pdf = new jsPDF('lanscape');
        var options = { pagesplit: true,'background': '#fff' };
        pdf.addHTML($('#customer_report_section'), options, function() {
            pdf.save("<?php echo $username . ".pdf"; ?>");
        });
    });
});
eKoopmans commented 7 years ago

Hi @raghbendra2015, I haven't worked any of these changes into jsPDF yet, but I will soon! I was describing fixes I've made in my separate package, html2pdf, but @Flamenco suggested changing that package so it could be incorporated directly into jsPDF. And I'm almost ready to do that!

raghbendra2015 commented 7 years ago

@eKoopmans, thanks for your reply I am using below code which is updated:

    $('#download_as_pdf').click(function() {;
        var pdf = new jsPDF('p', 'pt', 'a4'); // basic create pdf
        pdf.internal.scaleFactor = 4; // play with this value

        pdf.addHTML(document.getElementById('customer_report_section'), {pagesplit: true, retina: true, background: '#fff'}, function () { // addHtml with automatic pageSplit
            //var out = pdf.save('dataurlnewwindow');
            // output format of your pdf -> there are a lot blob, base64....
            pdf.save("<?php echo $username . ".pdf"; ?>");
        });
    });

But Facing the same issue regarding the format so is there any other way to handle this or I should go for other options, please suggest if we can fix it. AB Mauri (3).pdf

eKoopmans commented 7 years ago

For now I would recommend trying html2pdf, and make sure to use the versions of the dependencies included in the vendor directory there. You can create a PDF from your component like so:

html2pdf(document.getElementById('customer_report_section'));
rahulbussa commented 7 years ago

@eKoopmans Thanks for great plugin , but i am not able to generate pdf for large html data.Any help

eKoopmans commented 7 years ago

Hi @rahulbussa, html2pdf currently relies on html2canvas to generate an image from the HTML. Take a look at the issues there, I know there have been other concerns with slow performance/etc. Good luck!

marc-y-marc commented 7 years ago

@eKoopmans

Dear sir, It is again proven that not all hero's wear a cape these days (or do you? ;-).

My god, you have done an amazing job in simplifying this whole struggle to generate PDF's from HTML via javascript. I have had so many issues untill i came across this post.

Thank you so much mister!

eKoopmans commented 7 years ago

You're welcome, citizen! Hah no capes, but I'm always happy to help. Take it easy.

gpatki commented 7 years ago

Hello!

I have a query regarding stylesheet support. I have css linked in the html that I would be passing for conversion. Will the generated pdf pick these up? Please do point to any link with example for the same if any. Currently my pdf is generating without the styles.

Thanks in advance!

keyurshubham2014 commented 6 years ago

Hello. I want to use this in reactJs. can you suggest me examples for that and how to import html2pdf in react?

pavitrakumar78 commented 6 years ago

@eKoopmans hey! Nice work! Both your implementation and the original html2pdf are basically :
DOM -> html2canvas -> canvas -> add as an Image in jsPDF - right?... So these are not zoomable pdfs, correct?

eKoopmans commented 6 years ago

@pavitrakumar78, yes it converts to an image first. In a perfect world we'd be able to go straight from HTML to PDF (which apparently is/was in the works in jsPDF), but I think HTML's just too complex to do that realistically.

I did make a modification to html2canvas to add a 'DPI' feature, so you can increase the resolution of the images at least.

eKoopmans commented 6 years ago

@gpatki really sorry I never got back to you! Yes the styles should work, can you send me an example of it not working? It would be best if you opened an issue on the html2pdf page - it could be a bug/problem with the package!

@keyurshubham2014 also sorry I never responded! html2pdf should play nicely with React - just in your HTML you need to include the <script> tags mentioned in the readme, and on whatever event you want to create the PDF (e.g. a button click), you call html2pdf() giving whatever DOM element you want to print.

moises-morales commented 6 years ago

@eKoopmans Hey!

I have been using your JS Package, and it's very useful, but i have a question, how can i add header and footer? I've seen, i could add them with jsPDF with the method fromHTML, like the next example:

https://plnkr.co/edit/trrLAn6I9o2OwSkRFBZK?p=preview

I hope that i can explained me.

Thanks.

Uzlopak commented 6 years ago

Hi @eKoopmans

i completely refactored the context2d plugin. It is not "perfect" but it is now mainly resulting in the same behaviour like a canvas element. So maybe we can modify your plugin to use context2d?

See #1931

(I will merge it asap when i know why even though all tests run perfectly I get an IE11 error)

Uzlopak commented 6 years ago

TBH we should deprecate addHTML and fromHTML and focus on your plugin. I really dont think that we will ever "support" addHTML and fromHTML

Uzlopak commented 6 years ago

@eKoopmans

Can we call the method maybe html?

Uzlopak commented 6 years ago

I created a html-plugin based on your html2pdf code

https://github.com/MrRio/jsPDF/blob/master/plugins/html.js

Flamenco commented 6 years ago

@arasabbasi Is the context2d pageWrap code working? I can't remember if it was experimental or not.

Uzlopak commented 6 years ago

Yes. Had to figure out what the problem actually was. First of all html2canvas is just taking the visible area of the html code. So you would have to give the correct parameters so that it will convert all html-elements without hiding them by clipping them. And second we had in canvas.js the setter for width and height effecting the pageWrapY and pageWrapX properties of context2d. html2pdf seems to call at one point the setters for width and height of canvas resulting in messing up with the pageWrapY and pageWrapX and thus resulting in no pageBreak at all. See:

image

Next Step would be to find a solution for "overflowing" elements. Because pageWrap works, but not perfect. I thought today about possible solutions... alot were hacky and about cutting elements in parts and so. Then I realized it is much simpler to solve:

For example, we have a rectangle which overflows over two pages on the Y-Axis. We dont need to make hefty calculations. We just put the rectangle on the correct position in the first page and put a duplicate of the rectangle on the second page with y - heightOfVisibleAreaOnPage1. We could do this with all even more complex elements.

I didnt delete addhtml.js and fromhtml.js. Maybe fromhtml is still as a lightweight html to pdf parser useful.. i dont know yet.

Another problem is, that the IE support is horrible as usual. And to be honest... I hate to optimize for IE. And the cherry on top is, that you can not just simply open the example files because IE can not show dataurl based pdfs. pain in the ass.

Flamenco commented 6 years ago

We made a decision 5 years ago to not support IE at all. We told all our clients to install Firefox or Chrome. The few that complained actually thanked us in the end for weaning them off IE. Obviously no complaints from the developers either.

Hey, your refactoring looks great. I have a unpublished fork of context2d that is now going to be a PITA to merge in though...

BTW I finally got around to publishing one of the sites that context2d was written for: https://www.quickchords.org/collections/. Rendering to c2d/canvas instead of PDF for preview is at least 10x faster. Fast enough for near real time rendering.

eKoopmans commented 5 years ago

Hi @arasabbasi, thanks for your updates! A few questions:

  1. jsPDF's context2d is news to me! Is the idea that it should behave exactly like a canvas' context?
  2. Does it draw vectors into the PDF, or images? E.g. if I do ctx.fillText('Hello', 0, 0), will the text be selectable in the PDF?
  3. If it's vectors, awesome!

In terms of porting html2pdf into a jsPDF plugin, that was always the goal, so thank you for taking the initiative! It wasn't really possible until my big API upgrade with v0.9.0.

That said I think it may be a simpler process of including html2pdf as a dependency in package.json, and writing a simple wrapper script that maps some of the API onto jsPDF, but I'm not too sure of the standard procedure for plugins here.

Thanks again!

Uzlopak commented 5 years ago

Hi @eKoopmans,

  1. Originally @Flamenco wrote it. It is mimicking context2d interface. So yes. It makes the jsPDF behave like a canvas context.
  2. vectors and text. Your ctx.fillText will result in a call to context2d.fillText which then calls jsPDF.API.text

I put html.js as a plugin but html2canvas as dependency into package.json ;).

Uzlopak commented 5 years ago

@eKoopmans @Flamenco

I refactored context2d (again, lol). Now the context2d has a property "autoPaging". If set to true, the pdf becomes so to speak a huge canvas.

Check it out.

Uzlopak commented 5 years ago

Check the examples in /examples/html2pdf .

Uzlopak commented 5 years ago

Hijacking this topic and making it the discussion topic for the new .html method

Uzlopak commented 5 years ago

ToDo-List:

Uzlopak commented 5 years ago

text is now able to be processed in autoPaging Mode with commit b8be588

Et3rnal commented 5 years ago

Thanks for the great library I'm about to use the plugin, but now I'm confused on what should I use to do HTML to pdf? is the new method ready? If not what is the best option I have? at least from migration to the new method once its ready?

Uzlopak commented 5 years ago

The new .html() method is working, but it is the first release with the method. So it can be buggy. But the benefit of the new method is, that we/I can debug and bugfix it (or blame it on html2canvas). This is because I refactored the context2d-module of jspdf and made its behavior nearly identical to a real HTMLContext2d-Object. So all the html conversion is made via html2canvas.

With the old methods, you will face limitations. With addHTML or with @eKoopmans html2pdf tool you will utilize html2canvas, too but you will get rendered images of the html-data instead of native pdf data. With fromHTML you will get a limited html-parser. 6 months ago we had here about 500 issues just regarding addHTML and fromHTML, and nobody was providing any bugfixes for them. So thats why we deprecated addHTML and fromHTML. So using them makes no sense.

The benefit of eKoopmans html2pdf tool is, that you probably get better results when you depend on the linking urls or pagebreaks, as they are not implemented yet by jspdf. But they are planned to be added in the future, so the gap will decrease.

Uzlopak commented 5 years ago

Or just test it through :P

Et3rnal commented 5 years ago

Thanks for the details answer I was going with the html() until you said pageBreak is not there yet :\ I think Ill test through to find what works best for me :D Thank you and happy new year

Peppe87 commented 5 years ago

Hi, I'm trying to use the library to print a page for a project, but I'm having major problems to use this new .html() method: the result is cut on the right by a far margin. The element I'm trying to print - only a part of the total markup, but for now I've stopped here - is a table with a max-width of 700px. As you can see from the screenshots, the result of html() is an image which looks bigger and streched in comparison of the original content (e.g. text is bigger) and cut on the right. If I try to print this html via html2canvas I get the correct result. Moreover, if I try with a full-width table, it's very evident the html() miss a big piece.

The html markup is:

<table id="element-to-print" style="max-width: 700px;" class="not-strip-table">
    <tbody>
    <tr>
        <td>
            <address>
                <h3>
                    Test workspace
                    <small>Organization holder</small>
                </h3>
                <div><p>Org address</p>
                </div>
                <div>
                    VAT: 1111111111
                </div>
                <div>
                    Fiscal code: 11111111
                </div>
            </address>
        </td>
        <td>
            <h2>Quote</h2>
            <div class="text-muted">QUO-2018-11</div>
            <!-- todo gestire revisioni -->
            <!-- <p class="text-muted">Rev. 2</p>-->
            <div class="field field-name-field-quote-status field-type-workflow field-label-hidden">
                <div class="field-items">
                    <div class="field-item even">draft</div>
                </div>
            </div>
        </td>
    </tr>
    <tr>
        <td colspan="2">
            <address>
                <div class="field field-name-field-company field-type-entityreference field-label-hidden">
                    <div class="field-items">
                        <div class="field-item even">
                            <div id="node-7655"
                                 class="node node-company contextual-links-region view-mode-company_details clearfix">

                                <h3>Test</h3>
                                <div class="company-details-content">
                                    C-2019/1
                                    <div class="field field-name-field-address field-type-addressfield field-label-hidden">
                                        <div class="field-items">
                                            <div class="field-item even">
                                                <div class="street-block">
                                                    <div class="thoroughfare">Address 1</div>
                                                    <div class="premise">Address 2</div>
                                                </div>
                                                <div class="addressfield-container-inline locality-block country-IT">
                                                    <span class="postal-code">0000</span> <span
                                                        class="locality">City</span> <span class="state">MI</span></div>
                                                <span class="country">Italy</span></div>
                                        </div>
                                    </div>
                                </div>

                            </div>
                        </div>
                    </div>
                </div>
                <div class="bold-text">
                    C.a.
                </div>
            </address>
        </td>
    </tr>
    <tr>
        <td colspan="2">
            <div class="quote-heading">
                QUO-2018-11
                <div class="field field-name-field-quote-customer-project-id field-type-text field-label-above">
                    <div class="field-label">Customer Project ID:&nbsp;</div>
                    <div class="field-items">
                        <div class="field-item even">1</div>
                    </div>
                </div>
                <div class="field field-name-field-quote-date field-type-datestamp field-label-inline clearfix">
                    <div class="field-label">Quote Date:&nbsp;</div>
                    <div class="field-items">
                        <div class="field-item even"><span class="date-display-single">30/08/2018</span></div>
                    </div>
                </div>
                <div class="field field-name-field-quote-expiration field-type-datestamp field-label-inline clearfix">
                    <div class="field-label">Quote Expiration:&nbsp;</div>
                    <div class="field-items">
                        <div class="field-item even"><span class="date-display-single">29/09/2018</span></div>
                    </div>
                </div>
                <div class="field field-name-field-quote-vendor-ref field-type-text field-label-inline clearfix">
                    <div class="field-label">Vendor Ref:&nbsp;</div>
                    <div class="field-items">
                        <div class="field-item even">Vendor Ref</div>
                    </div>
                </div>
            </div>
        </td>
    </tr>
    </tbody>
</table>

The code I'm using to print via html() is:

            $("#print-btn").click(function () {
                var element = document.getElementById('element-to-print');
                 var pdf = new jsPDF('p', 'pt', 'a4');
                 pdf.html(element, {
                 callback: function (pdf) {
                 pdf.save('Test.pdf');
                 }
                 });

            });

The code I've used to print via html2canvs (just a quick test) is:

            $("#print-btn").click(function () {
                var element = document.getElementById('element-to-print');
                html2canvas(element).then(function(canvas) {
                    document.body.appendChild(canvas);
                });
            });

Could you tell me if I'm doing something wrong, if there's some know bug or everything else? Thank you.

Screenshot of table rendered by browser: screenshot browser Screenshot of table rendered by html2canvas: screenshot html2canvas Screenshot of render of html(), 700px table: screenshot html

Pdf of 700px table via html(): Test.pdf Pdf of full-width table via html(): Test (full-width element).pdf

Uzlopak commented 5 years ago

html2canvas and the html canvas element work internally with px. .html is working with the units you are working with internally. So you set it to 'pt' and everything which you set in px will processed as pt. So if you instanciate the jspdf-instance with px you should it get less "streched". And secondly, an a4 page has 595 pt x 841 pt. So if you want to print a 700 pt wide table on to a 595 pt page, something gets cut off. Make the pages landscape to have the 700pt wide table fit into the 841 pt wide page.

Peppe87 commented 5 years ago

Thank you for you explanation. My Js is now:

            $("#print-btn").click(function () {
                var element = document.getElementById('element-to-print');
                 var pdf = new jsPDF('p', 'px', 'a4');
                 pdf.html(element, {
                 callback: function (pdf) {
                 pdf.save('Test.pdf');
                 }
                 });
            });

And I set #element-to-print's width to 595pt (I've also tried to set it to 595px because I was not sure, but results didn't change a lot).

However, the result is still very different from the original html: each letters is very spaced, the element on the right isn't printed.

new test.pdf

Could you kindly give me some more details, if possible? Thank you.

sandroden commented 5 years ago

I wanted to give jsPDF.html a try, and I'm using this code:

    savePdf () {
      var doc = new jsPDF({unit: 'mm', format: 'a4', orientation: 'portrait' })
      doc.html(document.getElementById('printable-cv'), {
        callback: function (pdf) {
          pdf.save('cv-a4.pdf')
        }
      })
    }

but I get error html2canvas not loaded: is it something I forgot? I do have html2canvas _"html2canvas": "^1.0.0-alpha.12". I'm using vuejs with webpack.

I'm currently using html2pdf with the following code:

   savePdf0 () {
      let opt = {
        filename: 'cv.pdf',
        enableLinks: true,
        image: { type: 'jpeg', quality: 0.98 },
        html2canvas: {
          scale: 8,
          useCORS: true,
          width: 310,
          letterRendering: true,
        },
        jsPDF: { unit: 'mm', format: 'a4', orientation: 'portrait' },
      }
      html2pdf().set(opt).from(document.getElementById('printable-cv')).save()
    },

that correcly finds html2canvas.

Shylmysten commented 5 years ago

The new plugin works great, however, one thing is missing - image compression. My PDF files are like 6.6M. Where before I could get them to compress down to 158kb using the combination of:

                var imgData = canvas.toDataURL("image/png", 1.0);
                pdf = new jsPDF('p', 'pt', [PDF_Width, PDF_Height], true);
                pdf.addImage(imgData, 'JPG', top_left_margin, top_margin, 590, 775, '', 'FAST');

The addImage had an extra compression that really crunched the images down before adding them to the pdf...

My old cold from above that produced 158kb pdf from the same html was as so:

var nodes = Array.from($('div[id ^= "studentCard_"]'));
var nodeLen = nodes.length;
nodes.map(cur => {
      html2canvas(cur, {allowTaint: true}).then(function(canvas) {
      calculatePDF_height_width(cur,0)
      var imgData = canvas.toDataURL("image/png", 1.0);
      pdf = new jsPDF('p', 'pt', [PDF_Width, PDF_Height], true); // compress PDF TRUE
      pdf.addImage(imgData, 'JPG', top_left_margin, top_margin, 590, 775, '', 'FAST'); // Compress Image 'FAST'
      pdf.output('save', 'studentCard_'+student+'.pdf');

Whereas my new code creates 6.6M PDFs from the same html elements with no way to compress them down further to send out via email - 6.6M is far to large for an email and it balloons to nearly 60M if I increase quality to 3 with again, no way to compress that image after conversion:

var opt = {
  margin:       1,
  filename:     'myfile.pdf',
  pagebreak:    'avoid-all',
  image:        { type: 'jpg', quality: 0.98},
  html2canvas:  { scale: 1, width: elWidth, height: elHeight },
  jsPDF:        { unit: 'pt', format: 'letter', orientation: 'portrait', compressPDF: true }
};
  html2pdf().set(opt).from(element2).save();

});

Any plans in the future to add back in the ability to compress those images along with the ability to compress the pdf itself?

sandroden commented 5 years ago

I really need/would like to test this method, but 'm still blocked by error: html2canvas not loaded. I guess it's a trivial error so I prepared a simple codesandbox in case someone can tell me what's the problem. The sandbox it's a vuejs app and the code is in components/HellowWorld.vue. A button is working and uses html2canvas, the other raises the error.

dslas4ever commented 5 years ago

@eKoopmans Thanks for this great plugin mate, keep it up your good work! Shout out for @arasabbasi as well for the jspdf plugin and html2canvas plugin! Thank you Guys, you save my day! Cheers!

try-it-atleast-onces commented 5 years ago

@arasabbasi Have used the .html() method to print the page in to PDF, as you have mentioned above have set the page width as 595px and the font size for the content is set correspondingly. It works fine in Desktop, when i try the same in mobile. The PDF created is very large, if i reduce the fontsize and page width it is working fine. Is there some other way to make the content work fine in Mobile device too.

Have attached the pdf for your reference, created using the jsPDF-master/examples/html2pdf/pdf2.html

desktop.pdf mobile.pdf

Can you please help??

amitmerin commented 5 years ago

@sandroden Try this: It should fix the error: "html2canvas not loaded"

saveAsPdf() {
      window.html2canvas = html2canvas;
      var doc = new jsPDF(
        'p', 'pt', 'a4'
      );
      doc.html(document.querySelector("body"), {
        callback: function(pdf) {
          pdf.save("cv-a4.pdf");
        }
      });
    }

You can find a forked running example of your code here

eKoopmans commented 5 years ago

Hey @arasabbasi, thanks again for your work on the .html() plugin! I'm in the process of merging the jsPDF canvas functionality directly into html2pdf.js, but I'm not getting the results I'd expect. Could you have a look at this fiddle: https://jsfiddle.net/eKoopmans/egm94jqh/

Notice that the PDF has unusual spacing on the text - it seems like there's some scaling gone wrong. This behaviour happens using the .html() plugin as well as using html2canvas directly with the jsPDF canvas.

I'm excited to get this working!

try-it-atleast-onces commented 5 years ago

@arasabbasi I also have the issue with text, but for me the text is merging with one another. For Example if I have 10 words in a like, first and last word of the words are merging at few places.

I am using the .html() method to print the PDF. Have you faced the issues like this, Your help is much appreciated.

bekab95 commented 5 years ago

@eKoopmans thanks very much !! I have spent days to get correct pdf without your plugin