Improved PDF rendering of XHTML and also HTML pages...

GoogleCodeExporter commented 9 years ago

*** This issue was imported from http://java.net/jira/browse/XHTMLRENDERER-263

It was reported by fassev on 02.03.2009 14:19:32 +0100 and last updated in the 
previous bug tracker on 09.12.2009 16:01:02 +0100

Found in
Operating System: All
Platform: All

The priority for this issue at migration was Major.
The original issue had attachments to it; see comments below.

Original description: 
First I want to thank for the nice and useful library! I needed it to render 
HTML pages as PDF in the same way as they are displayed in the browser. I was 
so happy to find a library, which was capable of rendering CSS style sheets. 
Unfortunately the current version has some limitations/errors (for instance 
XHTML) and shows some small but significant differences to the browses, when 
used with big and complex HTML sites.

I the last two weeks I spend a some time to improve and correct the library in 
such way, that now I am able to create PDF reports for the most HTML pages in 
may project. The results are nearly identical with the browser displayed pages 
(and some of theme are fairly complex). Now I would like to give a feed back to 
the community and to thank to everybody for the really huge work on this 
library. I am not sure whether the changes are compliant with the development 
policies and intentions, so feel free to incorporate them in the library or to 
reject them. I just found them useful when generating PDF reports and will be 
glad, if I can help with at least some of them. Please note that some of the 
features are targeting real PDF export, and not only HTML rendering.

The first big change was to use JTidy as DOM parser and skipping the DOM 
parsing by the xhtmllibrary. Now I am able to parse pure HTML files and not 
only XHTML. JTidy also clears the DOM Model. So I am providing the DOM-Model 
directly from JTidy to the renderer of the xhtmllibrary. Note: This is not 
included in the library itself, JTidy has to be downloaded separately.

What I have been able to improve/correct in the library itself:
- Correct tag and attribute handling by comparing all strings in case 
insensitive way (as CSS style sheets and matcher should be case insensitive)
- Enable links over images
- Automatic image scaling when printing big images which does not fit in the 
page
- Automatically extend the PDF Page width, when the content does not fit in the 
provided page size (don't clip the page content).
- Problem with page margin-boxes, text-align not correctly handled.
- Collapsed border calculation fix
- Text-align incorrectly inherited in sub-TABLES, DIVs etc..
- Extended text breaking for PDF-Printing. Used for instance to compact long 
tables with many columns.
- Text-Decoration fix: Decorate only text, but not images or blocks
- Page-brake inside rows.
- Custom resource and link handler - instead of sub-classing the whole 
UserAgent.
- Some minor bug fixes and improvements, like Input/Output stream handling, 
logging etc.
- I was not able to completely resolve a "height" problem, for instance when 
table includes a div or sub-table with relative height (something like 100%). 
In such case the the sub block is not stretched to 100% of the height of the 
parent, when the parent height was increased. There are still some minor 
paginating (on printing) issues.
- Bes

Unfortunately the changes (especially the case insensitive string handling) are 
spread over the whole library, so many of the sources have been changed, 
actually 75 files. I will attach the whole Java project source, where the 
changes (except the minor once) are commented with 'CHANGE PF'.
I have used the main thread and checked out the latest CVS sources as of 
01.March.2009. You may use them as diff source and find out, what have changed.

I will also attach some generated PDFs files as examples.

Feel free to use or not use any of the changes.

Regards
Peter Fassev

Original issue reported on code.google.com by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Attachment by fassev on 02.03.2009 14:36:58 +0100:  Example 1.pdf, size 43674 
bytes
Download: http://java.net/jira/secure/attachment/27376/Example 1.pdf

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Attachment by fassev on 02.03.2009 14:37:29 +0100:  Example 2 - wide list.pdf, 
size 52629 bytes
Download: http://java.net/jira/secure/attachment/27377/Example 2 - wide list.pdf

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Attachment by fassev on 02.03.2009 14:43:04 +0100:  Example 3 - HTML table with 
a Tree.pdf, size 117481 bytes
Download: http://java.net/jira/secure/attachment/27378/Example 3 - HTML table 
with a Tree.pdf

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Attachment by bago on 09.12.2009 16:01:02 +0100:  
note-xhtmlrenderer-bug263.diff, size 197162 bytes
Download: 
http://java.net/jira/secure/attachment/27379/note-xhtmlrenderer-bug263.diff

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Attachment by fassev on 02.03.2009 14:20:54 +0100:  xhtmlrenderer.zip, size 
694438 bytes
Download: http://java.net/jira/secure/attachment/27375/xhtmlrenderer.zip

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

fassev wrote on 02.03.2009 14:20:54 +0100:
Created an attachment (id=73)
the changed xhtmrender library

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

fassev wrote on 02.03.2009 14:36:59 +0100:
Created an attachment (id=74)
Panel layout

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

fassev wrote on 02.03.2009 14:37:29 +0100:
Created an attachment (id=75)
wide table

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

fassev wrote on 02.03.2009 14:43:04 +0100:
Created an attachment (id=76)
HTML table with a tree

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

pdoubleya wrote on 09.03.2009 09:23:47 +0100:
Thanks for all this work! Given the amount of changes you're addressing, this
won't make it into R8, though.

We will have to review the changes one-by-one. Some, such as defaulting to
JTidy, we probably won't accept. For pages rendered in XHTML, a standard DOM
parser (such as included in the JDK) works fine and saves the extra "cleanup"
work the JTidy would attempt to do. However, having nicer integration with a
library like JTidy would be a useful goal.

That's just one issue, once R8 is out the door (end of March?) we can look at
your proposals one-by-one.

Thanks again!

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

fassev wrote on 09.03.2009 10:56:51 +0100:
I understand, that the changes are too mutch, but you might have a look at 
least in the fix of the collapsed border calculation problem. The error is very 
simple. The border callulation in TableCellBox.collapsedLeftBorder() (it is the 
same for Top, Right and Bottom) is using many times the following check:
...
            if (result.hidden()) {
                return result;
            }
...

i think, that it would be correct to change all these expressions (I counted 26 
of them) in:
...
            if (result.exists()) {
                return result;
            }
...

Regards
Peter

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

bago wrote on 09.12.2009 16:01:02 +0100:
Created an attachment (id=100)
I created a diff against 2009.03.01

Original comment by pdoubl...@gmail.com on 16 Feb 2011 at 9:52

GoogleCodeExporter commented 9 years ago

Hello,
I'am using flying-saucer to convert html pages to pdf,im getting pdf file but 
the problem is table alignment is not working properly and you said ,you fixed 
the issue "Automatically extend the PDF Page width, when the content does not 
fit in the provided page size (don't clip the page content)."
where can i get changed library files?

Original comment by divyakri...@gmail.com on 27 Oct 2014 at 12:07

steven0lisa / flying-saucer

Improved PDF rendering of XHTML and also HTML pages... #69