vandeseer / easytable

Small table drawing library built upon Apache PDFBox
MIT License
246 stars 94 forks source link

Takes too long to draw and create pdf. #43

Closed plpanda closed 5 years ago

plpanda commented 5 years ago

The pdf drawing takes too long. Like to get a 300 pages pdf from dynamic data creating 300 tables i.e. a table on each page, it took me 3 minutes. Any way for faster performance.

vandeseer commented 5 years ago

Hi @plpanda,

First of all: performance was never a design goal of the library. It was not primarily intended to create such large documents in a very time efficient way.

Having said that it would of course still be nice to have an easy to use lib that is actually drawing quickly.

Can you share your code? And did you also try to create a huge document with 300 pages without any tables? Did it still take a long time? How are you calculating the data to be drawn?

Best, Stefan

plpanda commented 5 years ago

For each table, I used

RepeatedHeaderTableDrawer drawer = RepeatedHeaderTableDrawer.builder()
                .table(create())
                .startX(30)
                .startY(775F)
                .endY(25F) // note: if not set, table is drawn over the end of the page
                .build();

        do {
              PDPage page = new PDPage(new PDRectangle(1150,800));
              document.addPage(page);
              try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
                  drawer.contentStream(contentStream).draw();
              }
              drawer.startY(page.getMediaBox().getHeight() - 50); //50 is each page height
          } while (!drawer.isFinished());

and create() was called each time a table is drawn. It is done dynamically as

public static Table create() throws JSONException{
                final Table.TableBuilder tableBuilder = Table.builder().addColumnsOfWidth(200,200,690);
        tableBuilder.addRow(Row.builder()
                .add(CellText.builder().text("Some Heading")
                           .font(PDType1Font.HELVETICA_BOLD)
                          .textColor(Color.BLACK)
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .add(CellText.builder().text("Something")
                          .textColor(Color.BLACK)
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .add(CellText.builder().text("Some title")
                          .textColor(Color.BLACK)
                          .font(PDType1Font.HELVETICA_BOLD)
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .build());
        tableBuilder.addRow(Row.builder()
                .add(CellText.builder().text(getData("RiskLevel"))
                          .textColor(Color.BLACK)
                          .font(PDType1Font.HELVETICA_BOLD)
                          .backgroundColor(getColor("Medium : 2"))
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .add(CellText.builder().text("")
                          .textColor(Color.BLACK)
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .add(CellText.builder().text(getData("Issue Name"))
                          .textColor(Color.BLACK)
                          .font(PDType1Font.HELVETICA_BOLD)
                          .backgroundColor(getColor("issueName"))
                          .borderWidth(1.5F)
                          .horizontalAlignment(HorizontalAlignment.LEFT)
                          .verticalAlignment(VerticalAlignment.MIDDLE)
                          .build())
                .build());
.
.
...continued;

and to calculate the time the code I used is as

long start = System.currentTimeMillis();
        try {
            String data =req.getParameter("data");
            document = new PDDocument();
            try {
                JSONArray arr = new JSONArray(data);
                for (int i = 0; i < arr.length(); i++) {                    
                    IssueRisk.createTable(document,arr.getJSONObject(i));                   
                }
            } catch (Exception e) {
                System.out.println("Unable to transform data");
                e.printStackTrace();
            }
            document.save("pdfDownload.pdf");
            System.out.println("PDF Prepared for Download ");
            long end = System.currentTimeMillis();
            System.out.println("====================>"+(end - start) + " ms");

Is there any method to Optimize it?

Nonononoki commented 5 years ago

I'm having a similar issue with create few pages with a large dataset in DIN A1 size

vandeseer commented 5 years ago

Can you share your code @Nonononoki? I would be nice to have a minimal working example that illustrates the problem.

If I find a bit of time in the next days (rather weeks), I will try to have a look.

Cheers, Stefan

Nonononoki commented 5 years ago

Can you share your code @Nonononoki? I would be nice to have a minimal working example that illustrates the problem.

If I find a bit of time in the next days (rather weeks), I will try to have a look.

Cheers, Stefan

I mostly use the code from the examples. This code uses A2 instead of A1 but it's just as slow.

//Some colors
            final Color HEADER_COLOR = new Color(76, 129, 190);
            //final Color ROW_COLOR = new Color(250, 250, 250);
            final Color ROW_COLOR = new Color(255, 255, 255);
            final Color BORDER_COLOR = Color.GRAY;

            //final int COL_WIDTH = 100;
            final int BORDER_WIDTH = 1;
            final int FONT_SIZE = 9;

            //fill width
            if(table.getColumns().size() < max_rows) {
                max_rows = table.getColumns().size();
            }

            float pageWidth = PDRectangle.A2.getHeight();
            int columnWidth = Math.round((pageWidth - (offsetX*2)) / table.getColumns().size());

            // Define the table structure first
            TableBuilder tableBuilder = Table.builder();
            tableBuilder.fontSize(FONT_SIZE);
            //tableBuilder.font(PDType1Font.HELVETICA);
            tableBuilder.font(fontRegular);
            tableBuilder.borderColor(BORDER_COLOR);

            if(table.getColumns().size() == 0) {
                throw new CustomException("No columns found");
            }

            int rowNo = 0;

            // Add the header row ...
            RowBuilder headerRowBuilder = org.vandeseer.easytable.structure.Row.builder();
            headerRowBuilder.backgroundColor(HEADER_COLOR);
            headerRowBuilder.textColor(Color.WHITE);
            //headerRowBuilder.font(PDType1Font.HELVETICA_BOLD).fontSize(FONT_SIZE);
            headerRowBuilder.font(fontBold).fontSize(FONT_SIZE);
            headerRowBuilder.horizontalAlignment(CENTER);   
            headerRowBuilder.verticalAlignment(MIDDLE);

            for(String s : table.getColumns()) {
                 tableBuilder.addColumnOfWidth(columnWidth);
                 headerRowBuilder.add(CellText.builder().text(s).borderWidth(BORDER_WIDTH).build());
            }

            org.vandeseer.easytable.structure.Row headerRow = headerRowBuilder.build();
            tableBuilder.addRow(headerRow);

            // ... and some data rows
            for (int i = rowNo; i < table.getRows().size(); i++) {
                final List<String> dataRow = table.getRows().get(i);

                RowBuilder rowBuilder = org.vandeseer.easytable.structure.Row.builder();
                rowBuilder.backgroundColor(ROW_COLOR);
                rowBuilder.horizontalAlignment(CENTER);
                rowBuilder.verticalAlignment(MIDDLE);
                rowBuilder.wordBreak(true);

                for(String s : dataRow) {   
                    String value = s;
                    CellTextBuilder<?, ?> ctb = CellText.builder();

                    ctb.borderWidth(BORDER_WIDTH);
                    ctb.text(value);
                    CellText c = ctb.build();
                    //CellText.builder().text(value).borderWidth(1).build()
                    rowBuilder.add(c);
                }

                org.vandeseer.easytable.structure.Row row = rowBuilder.build();
                tableBuilder.addRow(row);
            }      

            pdfTable = tableBuilder.build();
        }

        //for(int i = 0; i < tables.size(); i++) {
        for(Table pdfTable : tables) {

            //Table pdfTable = tables.get(i);
            TableDrawer drawer = TableDrawer.builder()
                .table(pdfTable)
                .startX(offsetX)
                .endY(offsetY)
                .build();

            do {    
                PDPage page = new PDPage(new PDRectangle(PDRectangle.A2.getHeight(), PDRectangle.A2.getWidth()));
                document.addPage(page);
                drawer.startY(page.getMediaBox().getHeight() - offsetY);
                try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
                    drawer.contentStream(contentStream);
                    drawer.draw();
                    addPdfHeader(project, document, contentStream);
                }               
            } while (!drawer.isFinished());
    }
vandeseer commented 5 years ago

Thanks for sharing. As soon as I find some time I will have a look.

vandeseer commented 5 years ago

I finally found some time to carry out a few tests with the code you provided me with @plpanda. To be precise, this was the code: https://gist.github.com/vandeseer/deafdad48688376ef71bf4aaec9cf23e So I created one document with a huge table of several thousand rows.

I ran the test on my 7 year old laptop, i.e. I expect the code to be way quicker when run on more powerful metal:

Number of Rows Resulting Number of Pages Avg. Time
5 000 119 10s
10 000 239 15s
15 000 358 25s

Furthermore I created a second test where only one table (with 20 rows each) was drawn per page, not one big table over several pages. This was even faster:

Number of Pages Avg. Time
100 5s
200 7s
300 10s
400 12s

From what I see in your code @plpanda you are also doing JSON deserializing on your measurement. This may also eat up additional time and of course the requests themselves (which I guess you excluded when taking times).

vandeseer commented 5 years ago

@Nonononoki can you provide me with a minimal full working example? I mean a Java class that I can just run? Thanks!

Nonononoki commented 5 years ago

@vandeseer I mostly used the code from the examples, so you can just use those

vandeseer commented 5 years ago

I ran another test using this code: https://gist.github.com/vandeseer/b1455419486b16b757494ae8aa408502

This is basically the example from the README but with an increased size of rows. Still the results are more or less what I would expect in terms of runtime (again on my old laptop):

Number of Rows Resulting Number of Pages Time
100 8 2.8s
500 40 5.7s
1000 79 9.5s
5000 393 39s
10000 785 100s
12500 981 120s

So what I can say from those tests: Yes, it takes a bit to produce several dozens or several hundreds of pages (as I would expect), but I cannot see a major performance issue.

planbuildrun commented 5 years ago

It seems to me that neither the number of lines, nor the resulting number of pages is a problem, but the size of the cells. In our application we may have rather large cells with a few hundred or even more than a thousand characters, and this extremely slows down the PDF generation.

I created a small test class to prove my point:

public class EasytableLargeCell {

    private static final PDRectangle PAGE = PDRectangle.A4;
    private final static int MARGIN = 50;
    private final static float START_Y = PAGE.getHeight() - MARGIN;
    private final static float WIDTH = PAGE.getWidth() - 2 * MARGIN;

    public static void main(String[] args) throws IOException {
        EasytableLargeCell elc = new EasytableLargeCell();

        for (int numLines = 100; numLines <= 1000; numLines += 100) {
            for (int numCharsPerCell = 0; numCharsPerCell <= 300; numCharsPerCell += 50) {
                elc.create(numLines, numCharsPerCell);
            }
        }

    }

    public void create(int numLines, int numCharsPerCell) throws IOException {
        PDDocument document = new PDDocument();
        Table.TableBuilder tableBuilder = Table.builder()
                .addColumnsOfWidth(WIDTH)
                .font(PDType1Font.HELVETICA)
                .fontSize(10);

        long start = System.currentTimeMillis();

        for (int i = 0; i < numLines; i++) {
            tableBuilder.addRow(
                    Row.builder().add(
                            TextCell.builder()
                                    .text(StringUtils.repeat('x', numCharsPerCell))
                                    .borderWidth(1)
                                    .build())
                            .build());
        }

        TableDrawer.builder()
                .table(tableBuilder.build())
                .startX(MARGIN)
                .startY(START_Y)
                .endY(MARGIN)
                .build()
                .draw(() -> document, () -> new PDPage(PAGE), MARGIN);

        int numberOfPages = document.getNumberOfPages();
        document.save(String.format("D:/temp/LargeCell_%d_%d.pdf", numLines, numCharsPerCell));
        document.close();

        long duration = System.currentTimeMillis() - start;
        System.out.println(String.format("Generating %d lines with %d characters each results in %d pages and took %d ms", numLines, numCharsPerCell, numberOfPages, duration));
    }
}

Results:

Lines Characters per Cell Resulting pages Duration [ms]
100 0 3 238
100 50 3 263
100 100 4 849
100 150 4 5103
100 200 7 10396
100 250 7 23159
100 300 9 37231
200 0 5 32
200 50 5 169
200 100 8 949
200 150 8 9023
200 200 13 21869
200 250 13 46771
200 300 17 76560
300 0 7 39
300 50 7 269
300 100 12 2292
300 150 12 14462
300 200 19 33459
300 250 19 64939
300 300 25 109864
400 0 9 53
400 50 9 285
400 100 16 1926
400 150 16 18636
400 200 25 44003
400 250 25 81468
400 300 34 139518
500 0 11 69
500 50 11 363
500 100 20 2411
500 150 20 21335
500 200 32 50220
500 250 32 107239
500 300 42 190655
600 0 13 100
600 50 13 439
600 100 24 2928
600 150 24 33114
600 200 38 74056
600 250 38 149945
600 300 50 220684
700 0 15 132
700 50 15 570
700 100 28 3710
700 150 28 30914
700 200 44 76075
700 250 44 146628
700 300 59 249704
800 0 17 116
800 50 17 618
800 100 32 3957
800 150 32 39456
800 200 50 90325
800 250 50 220060
800 300 67 294699
900 0 19 131
900 50 19 717
900 100 36 4164
900 150 36 40980
900 200 57 94815
900 250 57 205766
900 300 75 318456
1000 0 21 147
1000 50 21 755
1000 100 40 4828
1000 150 40 45814
1000 200 63 103168
1000 250 63 212258
1000 300 84 343600
vandeseer commented 5 years ago

Thanks a lot for your analysis @planbuildrun! 👍 I will have a look again (as soon as I find time for it)! Maybe it's related to line breaking ...

planbuildrun commented 5 years ago

I haven't checked the code in detail yet, but from the test results, I'd say it's not the linebreaking per se. E. g. there's a huge gap between 100 and 150 chars (~ factor 10), although both end up with one linebreak each, and therefore the number of pages is identical. Anyway, I'd be happy if you can have a closer look. :)

vandeseer commented 5 years ago

As I already said I don't have much time right now, so it will take a bit longer until I can really have a closer look and come up with a solution. Nevertheless I quickly did some very rough look-over and I realized that there are quite a bunch of places in the code where things are just calculated again instead of caching them. This is mostly due to the fact that this library was never intended nor designed to create huge tables. Performance was not a concern for me. Nevertheless it would be nice to have a fast library as well :smile:

So I think that creating some caches for expensive calculations that are now done repeatedly should speed up things quite a bit. But this requires a bit more time, which I will have earliest end of the month ...

planbuildrun commented 5 years ago

No worries, I can wait.

I'm just happy to avoid using FOP... ;-)

vandeseer commented 5 years ago

After having had a closer look I can confirm that the basic issue is calculating stuff too many times instead of caching them. I implemented a solution, but I will still need to clean up the code and ensure that there's no regression (shouldn't be, but you never know). Hopefully I can release a new version within the next week. Will keep you posted.

vandeseer commented 5 years ago

The issue should now be solved with release 0.5.2. Those are some numbers from the performance test:

lines chars pages time
1000 0 21 21 ms
1000 50 21 71 ms
1000 100 40 527 ms
1000 150 40 3290 ms
1000 200 63 7555 ms
1000 250 63 14940 ms
1000 300 84 25032 ms

Looks way better ... :+1: :smiley:

planbuildrun commented 4 years ago

Confirmed. Works lika a charm!