ralfstuckert / pdfbox-layout

MIT License
156 stars 74 forks source link

Provide an option to avoid the removal of leading white spaces when drawing/layouting an element #11

Closed fref closed 7 years ago

fref commented 8 years ago

When outputting code (java, or formatted markup like yaml), it would be nice to have a simple way to keep the leading white spaces.

I tried something similar to your algorithm

        TextFlow flow = new TextFlow();

        int textStart = 0;
        Matcher matcher = PATTERN_LEADING_SPACES.matcher(text);
        while (matcher.find()) {
            if (0 <= textStart) {
                flow.addText(text.substring(textStart, matcher.start()), FONT_SIZE_TABLE, PDType1Font.COURIER);
                textStart = matcher.end();
            }
            String group = matcher.group();
            int indent = 0;
            for (char character : group.toCharArray()) {
                if ('\t' == character) {
                    indent += 4;
                } else {
                    indent += 1;
                }
            }
            flow.addMarkup(String.format("--{%sem}", indent), FONT_SIZE_TABLE, BaseFont.Courier);
        }
        flow.addText(text.substring(textStart), FONT_SIZE_TABLE, getFixedWidth());
        return flow;

and this works, in a way (but is quite cumbersome) and ends up showing something slightly weird for my samples (some lines are "spaced" when they shouldn't): image

It looks like newlines are inserted, but there are no newlines. I might be wrong, but this appears to be because TextSequenceUtil.wordWrap creates new instances of "Indent", new Indent(indentation).toStyledText() which are using default font and size. When remote debugging, the indentation seems to use FontDescriptor [font=PDType1Font Helvetica, size=11.0] instead of FontDescriptor [font=PDType1Font Courier, size=7.0] from the text.

ralfstuckert commented 8 years ago

Would you please provide your example "code" to format (the original and the misformatted output), so I may debug the problem?

fref commented 8 years ago

I'll send you that next week.

On 3 November 2016 at 19:37, Ralf Stuckert notifications@github.com wrote:

Would you please provide your example "code" to format (the original and the misformatted output), so I may debug the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ralfstuckert/pdfbox-layout/issues/11#issuecomment-258235297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH_CFoOmoCArRtKhgTWm0SSICT7waN0ks5q6inrgaJpZM4KhLJ2 .

fref commented 8 years ago

time flies, I've been overwhelmed at work, I'll make some time tomorrow

On 5 November 2016 at 07:42, Frédéric Donckels frederic.donckels@gmail.com wrote:

I'll send you that next week.

On 3 November 2016 at 19:37, Ralf Stuckert notifications@github.com wrote:

Would you please provide your example "code" to format (the original and the misformatted output), so I may debug the problem?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ralfstuckert/pdfbox-layout/issues/11#issuecomment-258235297, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH_CFoOmoCArRtKhgTWm0SSICT7waN0ks5q6inrgaJpZM4KhLJ2 .

fref commented 8 years ago

Here's some sample code which will exhibit various "arrangements" in the pdf I attached (various indentation attempts) TestLayoutIssue.pdf

As you can see, "higher" lines seem to be "randomly" inserted.

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import rst.pdfbox.layout.text.Alignment;
import rst.pdfbox.layout.text.Position;
import rst.pdfbox.layout.text.TextFlow;
import rst.pdfbox.layout.text.TextFlowUtil;

import java.io.File;
import java.io.IOException;
import java.util.GregorianCalendar;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import static org.apache.pdfbox.pdmodel.font.PDType1Font.COURIER;

@SuppressWarnings({"MagicNumber", "Duplicates"})
public class AAAALayoutTest extends Sample {

    private static final int FONT_SIZE = 7;

    private static final PDRectangle LANDSCAPE_A4 = new PDRectangle(PDRectangle.A4.getHeight(), PDRectangle.A4.getWidth());

    private static final Pattern PATTERN_LEADING_SPACES = Pattern.compile("^([ \t]+)", Pattern.MULTILINE);

    private static final String PREFORMATTED = "---\n" +
                                               "comment: Bla Bla bla\n" +
                                               "flushType: RESCAN\n" +
                                               "mandators:\n" +
                                               "\"BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA," +
                                               "B\n" +
                                               "LA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,BLA,\"\n" +
                                               "messageFilter: |\n" +
                                               "  ---\n" +
                                               "  -\n" +
                                               "    - !com.sample.test.commons.utilities.PropertyFilter\n" +
                                               "      operator: EQUALS\n" +
                                               "      propertyName: brokerMessageId\n" +
                                               "      sourceClass: StuffMessage\n" +
                                               "      value: !string \"00000000000000001-08-2015--13:58:55--2\"\n" +
                                               "reasonCode: 29\n" +
                                               "requesterId: sample2\n" +
                                               "riskType: BLA.BLA.BLA\n" +
                                               "states: \"1\"\n" +
                                               "userProfile: &3\n" +
                                               "  description: Rights granted to edit everything\n" +
                                               "  lastEditTime: \"2015-02-27T12:24\"\n" +
                                               "  lastEditUser: sample3\n" +
                                               "  name: \"superuser                       \"\n" +
                                               "  rights: !org.hibernate.collection.PersistentSet\n" +
                                               "    - !com.sample.test.commons.smc.model.SMCProfileRights\n" +
                                               "      lastEditTime: \"2015-01-29T13:02\"\n" +
                                               "      lastEditUser: sample3\n" +
                                               "      mandator: &8 ALL\n" +
                                               "      profile: *3\n" +
                                               "      role: &9\n" +
                                               "        description: \"SUPER\"\n" +
                                               "        lastEditTime: \"2015-01-29T13:02\"\n" +
                                               "        lastEditUser: sample3\n" +
                                               "        version: &11 0\n" +
                                               "      version: 36\n";

    private PDPageContentStream currentStream;

    private PDDocument pdfDocument;

    public PDPageContentStream getCurrentStream() {
        return this.currentStream;
    }

    public PDDocument getPdfDocument() {
        return this.pdfDocument;
    }

    @Override
    public void run() {
        this.pdfDocument = new PDDocument();
        PDDocumentInformation info = new PDDocumentInformation();
        info.setAuthor("Frédéric Donckels");
        info.setTitle("Test Layout Issue");
        info.setKeywords("Pdfbox Layout, Issue");
        info.setCreationDate(new GregorianCalendar());
        this.pdfDocument.setDocumentInformation(info);
        try {
            TextFlow indentedFlow;
            String indentedText;

            startNewPage();
            indentedText = indentLeadingSpace1(PREFORMATTED);
            indentedFlow = TextFlowUtil.createTextFlowFromMarkup(indentedText, FONT_SIZE, COURIER, COURIER, COURIER, COURIER );
            indentedFlow.drawText(getCurrentStream(), new Position(100, 500), Alignment.Left, null);

            startNewPage();
            indentedText = indentLeadingSpace2(PREFORMATTED);
            indentedFlow = TextFlowUtil.createTextFlowFromMarkup(indentedText, FONT_SIZE, COURIER, COURIER, COURIER, COURIER );
            indentedFlow.drawText(getCurrentStream(), new Position(100, 500), Alignment.Left, null);

            startNewPage();
            indentedFlow = indentLeadingSpace3(PREFORMATTED);
            indentedFlow.drawText(getCurrentStream(), new Position(100, 500), Alignment.Left, null);

            flushPage();
            File file = new File("TestLayoutIssue.pdf");
            this.pdfDocument.save(file);
            this.pdfDocument.close();
        } catch (IOException e) {
            e.printStackTrace();
        }

    }

    protected void startNewPage()
            throws IOException {
        flushPage();
        PDPage page = new PDPage(LANDSCAPE_A4);
        this.pdfDocument.addPage(page);
        this.currentStream = new PDPageContentStream(this.pdfDocument, page, AppendMode.APPEND, true);
    }

    private void flushPage()
            throws IOException {
        if (null != getCurrentStream()) {
            getCurrentStream().close();
        }
    }

    private PDPage getCurrentPage() {
        PDPageTree pages = this.pdfDocument.getPages();
        return pages.get(pages.getCount() - 1);
    }

    private String indentLeadingSpace1(String text) {
        Matcher matcher = PATTERN_LEADING_SPACES.matcher(text);
        StringBuffer buffer = new StringBuffer(text.length());
        while (matcher.find()) {
            String group = matcher.group();
            int indent = 0;
            for (char character : group.toCharArray()) {
                if ('\t' == character) {
                    indent += 4;
                } else {
                    indent += 1;
                }
            }
            matcher.appendReplacement(buffer, String.format("--{%sem}", indent));
        }
        matcher.appendTail(buffer);
        return buffer.toString();
    }

    private String indentLeadingSpace2(String text)
            throws IOException {
        Matcher matcher = PATTERN_LEADING_SPACES.matcher(text);
        StringBuffer buffer = new StringBuffer(text.length());
        while (matcher.find()) {
            String group = matcher.group();
            int indent = 0;
            for (char character : group.toCharArray()) {
                if ('\t' == character) {
                    indent += 4;
                } else {
                    indent += 1;
                }
            }
            StringBuilder indentReplace = new StringBuilder();
            for (int i = 0; indent > i; i++) {
                indentReplace.append("~");
            }
            matcher.appendReplacement(buffer, indentReplace.toString());
        }
        matcher.appendTail(buffer);
        return buffer.toString();
    }

    private TextFlow indentLeadingSpace3(String text)
            throws IOException {
        TextFlow flow = new TextFlow();

        int textStart = 0;
        Matcher matcher = PATTERN_LEADING_SPACES.matcher(text);
        while (matcher.find()) {
            if (0 <= textStart) {
                flow.addText(text.substring(textStart, matcher.start()), FONT_SIZE, COURIER);
                textStart = matcher.end();
            }
            String group = matcher.group();
            int indent = 0;
            for (char character : group.toCharArray()) {
                if ('\t' == character) {
                    indent += 4;
                } else {
                    indent += 1;
                }
            }
            flow.addMarkup(String.format("--{%sem}", indent), FONT_SIZE, COURIER,COURIER,COURIER,COURIER);
        }
        flow.addText(text.substring(textStart), FONT_SIZE, COURIER);
        return flow;
    }

    public static void main(String[] args) {
        AAAALayoutTest test = new AAAALayoutTest();
        test.run();
    }

}
fref commented 7 years ago

Any update?

ralfstuckert commented 7 years ago

Sorry, I'm currently a bit overloaded with both private and business work. I'll try to have a look at this the next days

ralfstuckert commented 7 years ago

After all: is there any line wrapping needed in the preformatted part? I mean, do you need something that would be equal to the HTML <pre> tag, whereby tabs would be handled correctly?

fref commented 7 years ago

Yes, wrapping would be needed, otherwise data could get lost.

ralfstuckert commented 7 years ago

Just to keep you informed: I'm working on the problem, hopefully there will be a release the next days.

fref commented 7 years ago

:+1: I'm so eager to get rid of Jasper.

ralfstuckert commented 7 years ago

Ok, works in version 0.8.5. There were multiple problems.

  1. Blanks at the start of a line has been removed. This was done in order to avoid annoying leading blanks after word wrapping. Now this is done and nothing more: leading blanks on WRAPPED lines are removed, but not those in the original text.
  2. You used addMarkup() to add the text. This will interpret all kinds of characters as markup. Use addText() instead ;-)
  3. Word wrapping was bad at all, this has been immproved by #10
fref commented 7 years ago

Great news! Thank you.