petermr / ami3

Integration of cephis and normami code into a single base. Tests will be slimmed down
Apache License 2.0
17 stars 5 forks source link

migrate PDFRenderer and PageDrawer from `pdfbox` to `ami` #8

Open petermr opened 4 years ago

petermr commented 4 years ago

migrate PDFRendering and PageDrawer to ami

The examples in pdfbox are the basis for extraction of graphic primitives from the PDFStream and demonstrate subclassing. This more-or-less worked with PDFBox1 but the transition to PDFBox 2 has only partially worked. see Issue #7

Almost certainly because I rushed it.

This is now hopefully a careful, documented re-engineering.

petermr commented 4 years ago

PDFBox routines.

This describes the out-of-the box routines in PDFBox which I have copied to ami3 prior to editing. The PDFBox examples were originally in: /pdfbox-examples/src/main/java/org/apache/pdfbox/examples/rendering/

they have been copied to

dir /ami3/src/main/java/org/contentmine/pdf2svg/rendering/

The following are kept as reference. They can be used with ami PDF files to detect any regression.

CustomGraphicsStreamEngine.java

This prints the operations (call-backs) to stdout. Not complete - we need to add more. Maybe should write this output to file before converting to SVG. Probably mainly useful as diagnostic. Might be used to populate an intermediate stream.

CustomPageDrawer.java => MyPageDrawerExample

This feeds the stream to MyPDFRenderer (subclass of PDFRenderer) which ultimately renders the stream as an image:

        PDDocument doc = PDDocument.load(file);
        PDFRenderer renderer = new MyPDFRenderer(doc);
        BufferedImage image = renderer.renderImage(pageNumber);

It is confusingly named (does NOT subclass PageDrawer) and provides the factory for MyPageDrawer. Will be kept as the example. Will COPY and rename to MyPageDrawerExample.

MyPageDrawer

This is a subclass of PageDrawer where we can trap all the graphics calls. Currently a private static class in MyPageDrawerExample but will be extracted to be a separate public class .

MyPDFRenderer

This is a subclass of PDFRenderer. Currently a private static class in MyPageDrawerExample which acts as a factory to create MyPageDrawer. Recommend leave as is.