Closed plpanda closed 5 years ago
Hi @plpanda,
First of all: performance was never a design goal of the library. It was not primarily intended to create such large documents in a very time efficient way.
Having said that it would of course still be nice to have an easy to use lib that is actually drawing quickly.
Can you share your code? And did you also try to create a huge document with 300 pages without any tables? Did it still take a long time? How are you calculating the data to be drawn?
Best, Stefan
For each table, I used
RepeatedHeaderTableDrawer drawer = RepeatedHeaderTableDrawer.builder()
.table(create())
.startX(30)
.startY(775F)
.endY(25F) // note: if not set, table is drawn over the end of the page
.build();
do {
PDPage page = new PDPage(new PDRectangle(1150,800));
document.addPage(page);
try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
drawer.contentStream(contentStream).draw();
}
drawer.startY(page.getMediaBox().getHeight() - 50); //50 is each page height
} while (!drawer.isFinished());
and create() was called each time a table is drawn. It is done dynamically as
public static Table create() throws JSONException{
final Table.TableBuilder tableBuilder = Table.builder().addColumnsOfWidth(200,200,690);
tableBuilder.addRow(Row.builder()
.add(CellText.builder().text("Some Heading")
.font(PDType1Font.HELVETICA_BOLD)
.textColor(Color.BLACK)
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.add(CellText.builder().text("Something")
.textColor(Color.BLACK)
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.add(CellText.builder().text("Some title")
.textColor(Color.BLACK)
.font(PDType1Font.HELVETICA_BOLD)
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.build());
tableBuilder.addRow(Row.builder()
.add(CellText.builder().text(getData("RiskLevel"))
.textColor(Color.BLACK)
.font(PDType1Font.HELVETICA_BOLD)
.backgroundColor(getColor("Medium : 2"))
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.add(CellText.builder().text("")
.textColor(Color.BLACK)
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.add(CellText.builder().text(getData("Issue Name"))
.textColor(Color.BLACK)
.font(PDType1Font.HELVETICA_BOLD)
.backgroundColor(getColor("issueName"))
.borderWidth(1.5F)
.horizontalAlignment(HorizontalAlignment.LEFT)
.verticalAlignment(VerticalAlignment.MIDDLE)
.build())
.build());
.
.
...continued;
and to calculate the time the code I used is as
long start = System.currentTimeMillis();
try {
String data =req.getParameter("data");
document = new PDDocument();
try {
JSONArray arr = new JSONArray(data);
for (int i = 0; i < arr.length(); i++) {
IssueRisk.createTable(document,arr.getJSONObject(i));
}
} catch (Exception e) {
System.out.println("Unable to transform data");
e.printStackTrace();
}
document.save("pdfDownload.pdf");
System.out.println("PDF Prepared for Download ");
long end = System.currentTimeMillis();
System.out.println("====================>"+(end - start) + " ms");
Is there any method to Optimize it?
I'm having a similar issue with create few pages with a large dataset in DIN A1 size
Can you share your code @Nonononoki? I would be nice to have a minimal working example that illustrates the problem.
If I find a bit of time in the next days (rather weeks), I will try to have a look.
Cheers, Stefan
Can you share your code @Nonononoki? I would be nice to have a minimal working example that illustrates the problem.
If I find a bit of time in the next days (rather weeks), I will try to have a look.
Cheers, Stefan
I mostly use the code from the examples. This code uses A2 instead of A1 but it's just as slow.
//Some colors
final Color HEADER_COLOR = new Color(76, 129, 190);
//final Color ROW_COLOR = new Color(250, 250, 250);
final Color ROW_COLOR = new Color(255, 255, 255);
final Color BORDER_COLOR = Color.GRAY;
//final int COL_WIDTH = 100;
final int BORDER_WIDTH = 1;
final int FONT_SIZE = 9;
//fill width
if(table.getColumns().size() < max_rows) {
max_rows = table.getColumns().size();
}
float pageWidth = PDRectangle.A2.getHeight();
int columnWidth = Math.round((pageWidth - (offsetX*2)) / table.getColumns().size());
// Define the table structure first
TableBuilder tableBuilder = Table.builder();
tableBuilder.fontSize(FONT_SIZE);
//tableBuilder.font(PDType1Font.HELVETICA);
tableBuilder.font(fontRegular);
tableBuilder.borderColor(BORDER_COLOR);
if(table.getColumns().size() == 0) {
throw new CustomException("No columns found");
}
int rowNo = 0;
// Add the header row ...
RowBuilder headerRowBuilder = org.vandeseer.easytable.structure.Row.builder();
headerRowBuilder.backgroundColor(HEADER_COLOR);
headerRowBuilder.textColor(Color.WHITE);
//headerRowBuilder.font(PDType1Font.HELVETICA_BOLD).fontSize(FONT_SIZE);
headerRowBuilder.font(fontBold).fontSize(FONT_SIZE);
headerRowBuilder.horizontalAlignment(CENTER);
headerRowBuilder.verticalAlignment(MIDDLE);
for(String s : table.getColumns()) {
tableBuilder.addColumnOfWidth(columnWidth);
headerRowBuilder.add(CellText.builder().text(s).borderWidth(BORDER_WIDTH).build());
}
org.vandeseer.easytable.structure.Row headerRow = headerRowBuilder.build();
tableBuilder.addRow(headerRow);
// ... and some data rows
for (int i = rowNo; i < table.getRows().size(); i++) {
final List<String> dataRow = table.getRows().get(i);
RowBuilder rowBuilder = org.vandeseer.easytable.structure.Row.builder();
rowBuilder.backgroundColor(ROW_COLOR);
rowBuilder.horizontalAlignment(CENTER);
rowBuilder.verticalAlignment(MIDDLE);
rowBuilder.wordBreak(true);
for(String s : dataRow) {
String value = s;
CellTextBuilder<?, ?> ctb = CellText.builder();
ctb.borderWidth(BORDER_WIDTH);
ctb.text(value);
CellText c = ctb.build();
//CellText.builder().text(value).borderWidth(1).build()
rowBuilder.add(c);
}
org.vandeseer.easytable.structure.Row row = rowBuilder.build();
tableBuilder.addRow(row);
}
pdfTable = tableBuilder.build();
}
//for(int i = 0; i < tables.size(); i++) {
for(Table pdfTable : tables) {
//Table pdfTable = tables.get(i);
TableDrawer drawer = TableDrawer.builder()
.table(pdfTable)
.startX(offsetX)
.endY(offsetY)
.build();
do {
PDPage page = new PDPage(new PDRectangle(PDRectangle.A2.getHeight(), PDRectangle.A2.getWidth()));
document.addPage(page);
drawer.startY(page.getMediaBox().getHeight() - offsetY);
try (PDPageContentStream contentStream = new PDPageContentStream(document, page)) {
drawer.contentStream(contentStream);
drawer.draw();
addPdfHeader(project, document, contentStream);
}
} while (!drawer.isFinished());
}
Thanks for sharing. As soon as I find some time I will have a look.
I finally found some time to carry out a few tests with the code you provided me with @plpanda. To be precise, this was the code: https://gist.github.com/vandeseer/deafdad48688376ef71bf4aaec9cf23e So I created one document with a huge table of several thousand rows.
I ran the test on my 7 year old laptop, i.e. I expect the code to be way quicker when run on more powerful metal:
Number of Rows | Resulting Number of Pages | Avg. Time |
---|---|---|
5 000 | 119 | 10s |
10 000 | 239 | 15s |
15 000 | 358 | 25s |
Furthermore I created a second test where only one table (with 20 rows each) was drawn per page, not one big table over several pages. This was even faster:
Number of Pages | Avg. Time |
---|---|
100 | 5s |
200 | 7s |
300 | 10s |
400 | 12s |
From what I see in your code @plpanda you are also doing JSON deserializing on your measurement. This may also eat up additional time and of course the requests themselves (which I guess you excluded when taking times).
@Nonononoki can you provide me with a minimal full working example? I mean a Java class that I can just run? Thanks!
@vandeseer I mostly used the code from the examples, so you can just use those
I ran another test using this code: https://gist.github.com/vandeseer/b1455419486b16b757494ae8aa408502
This is basically the example from the README but with an increased size of rows. Still the results are more or less what I would expect in terms of runtime (again on my old laptop):
Number of Rows | Resulting Number of Pages | Time |
---|---|---|
100 | 8 | 2.8s |
500 | 40 | 5.7s |
1000 | 79 | 9.5s |
5000 | 393 | 39s |
10000 | 785 | 100s |
12500 | 981 | 120s |
So what I can say from those tests: Yes, it takes a bit to produce several dozens or several hundreds of pages (as I would expect), but I cannot see a major performance issue.
It seems to me that neither the number of lines, nor the resulting number of pages is a problem, but the size of the cells. In our application we may have rather large cells with a few hundred or even more than a thousand characters, and this extremely slows down the PDF generation.
I created a small test class to prove my point:
public class EasytableLargeCell {
private static final PDRectangle PAGE = PDRectangle.A4;
private final static int MARGIN = 50;
private final static float START_Y = PAGE.getHeight() - MARGIN;
private final static float WIDTH = PAGE.getWidth() - 2 * MARGIN;
public static void main(String[] args) throws IOException {
EasytableLargeCell elc = new EasytableLargeCell();
for (int numLines = 100; numLines <= 1000; numLines += 100) {
for (int numCharsPerCell = 0; numCharsPerCell <= 300; numCharsPerCell += 50) {
elc.create(numLines, numCharsPerCell);
}
}
}
public void create(int numLines, int numCharsPerCell) throws IOException {
PDDocument document = new PDDocument();
Table.TableBuilder tableBuilder = Table.builder()
.addColumnsOfWidth(WIDTH)
.font(PDType1Font.HELVETICA)
.fontSize(10);
long start = System.currentTimeMillis();
for (int i = 0; i < numLines; i++) {
tableBuilder.addRow(
Row.builder().add(
TextCell.builder()
.text(StringUtils.repeat('x', numCharsPerCell))
.borderWidth(1)
.build())
.build());
}
TableDrawer.builder()
.table(tableBuilder.build())
.startX(MARGIN)
.startY(START_Y)
.endY(MARGIN)
.build()
.draw(() -> document, () -> new PDPage(PAGE), MARGIN);
int numberOfPages = document.getNumberOfPages();
document.save(String.format("D:/temp/LargeCell_%d_%d.pdf", numLines, numCharsPerCell));
document.close();
long duration = System.currentTimeMillis() - start;
System.out.println(String.format("Generating %d lines with %d characters each results in %d pages and took %d ms", numLines, numCharsPerCell, numberOfPages, duration));
}
}
Results:
Lines | Characters per Cell | Resulting pages | Duration [ms] |
---|---|---|---|
100 | 0 | 3 | 238 |
100 | 50 | 3 | 263 |
100 | 100 | 4 | 849 |
100 | 150 | 4 | 5103 |
100 | 200 | 7 | 10396 |
100 | 250 | 7 | 23159 |
100 | 300 | 9 | 37231 |
200 | 0 | 5 | 32 |
200 | 50 | 5 | 169 |
200 | 100 | 8 | 949 |
200 | 150 | 8 | 9023 |
200 | 200 | 13 | 21869 |
200 | 250 | 13 | 46771 |
200 | 300 | 17 | 76560 |
300 | 0 | 7 | 39 |
300 | 50 | 7 | 269 |
300 | 100 | 12 | 2292 |
300 | 150 | 12 | 14462 |
300 | 200 | 19 | 33459 |
300 | 250 | 19 | 64939 |
300 | 300 | 25 | 109864 |
400 | 0 | 9 | 53 |
400 | 50 | 9 | 285 |
400 | 100 | 16 | 1926 |
400 | 150 | 16 | 18636 |
400 | 200 | 25 | 44003 |
400 | 250 | 25 | 81468 |
400 | 300 | 34 | 139518 |
500 | 0 | 11 | 69 |
500 | 50 | 11 | 363 |
500 | 100 | 20 | 2411 |
500 | 150 | 20 | 21335 |
500 | 200 | 32 | 50220 |
500 | 250 | 32 | 107239 |
500 | 300 | 42 | 190655 |
600 | 0 | 13 | 100 |
600 | 50 | 13 | 439 |
600 | 100 | 24 | 2928 |
600 | 150 | 24 | 33114 |
600 | 200 | 38 | 74056 |
600 | 250 | 38 | 149945 |
600 | 300 | 50 | 220684 |
700 | 0 | 15 | 132 |
700 | 50 | 15 | 570 |
700 | 100 | 28 | 3710 |
700 | 150 | 28 | 30914 |
700 | 200 | 44 | 76075 |
700 | 250 | 44 | 146628 |
700 | 300 | 59 | 249704 |
800 | 0 | 17 | 116 |
800 | 50 | 17 | 618 |
800 | 100 | 32 | 3957 |
800 | 150 | 32 | 39456 |
800 | 200 | 50 | 90325 |
800 | 250 | 50 | 220060 |
800 | 300 | 67 | 294699 |
900 | 0 | 19 | 131 |
900 | 50 | 19 | 717 |
900 | 100 | 36 | 4164 |
900 | 150 | 36 | 40980 |
900 | 200 | 57 | 94815 |
900 | 250 | 57 | 205766 |
900 | 300 | 75 | 318456 |
1000 | 0 | 21 | 147 |
1000 | 50 | 21 | 755 |
1000 | 100 | 40 | 4828 |
1000 | 150 | 40 | 45814 |
1000 | 200 | 63 | 103168 |
1000 | 250 | 63 | 212258 |
1000 | 300 | 84 | 343600 |
Thanks a lot for your analysis @planbuildrun! 👍 I will have a look again (as soon as I find time for it)! Maybe it's related to line breaking ...
I haven't checked the code in detail yet, but from the test results, I'd say it's not the linebreaking per se. E. g. there's a huge gap between 100 and 150 chars (~ factor 10), although both end up with one linebreak each, and therefore the number of pages is identical. Anyway, I'd be happy if you can have a closer look. :)
As I already said I don't have much time right now, so it will take a bit longer until I can really have a closer look and come up with a solution. Nevertheless I quickly did some very rough look-over and I realized that there are quite a bunch of places in the code where things are just calculated again instead of caching them. This is mostly due to the fact that this library was never intended nor designed to create huge tables. Performance was not a concern for me. Nevertheless it would be nice to have a fast library as well :smile:
So I think that creating some caches for expensive calculations that are now done repeatedly should speed up things quite a bit. But this requires a bit more time, which I will have earliest end of the month ...
No worries, I can wait.
I'm just happy to avoid using FOP... ;-)
After having had a closer look I can confirm that the basic issue is calculating stuff too many times instead of caching them. I implemented a solution, but I will still need to clean up the code and ensure that there's no regression (shouldn't be, but you never know). Hopefully I can release a new version within the next week. Will keep you posted.
The issue should now be solved with release 0.5.2. Those are some numbers from the performance test:
lines | chars | pages | time |
---|---|---|---|
1000 | 0 | 21 | 21 ms |
1000 | 50 | 21 | 71 ms |
1000 | 100 | 40 | 527 ms |
1000 | 150 | 40 | 3290 ms |
1000 | 200 | 63 | 7555 ms |
1000 | 250 | 63 | 14940 ms |
1000 | 300 | 84 | 25032 ms |
Looks way better ... :+1: :smiley:
Confirmed. Works lika a charm!
The pdf drawing takes too long. Like to get a 300 pages pdf from dynamic data creating 300 tables i.e. a table on each page, it took me 3 minutes. Any way for faster performance.