opensagres / xdocreport

XDocReport means XML Document reporting. It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).
https://github.com/opensagres/xdocreport
1.22k stars 372 forks source link

PDF converter from docx : words are overriding (examples attached) #583

Open ralborodo-RatedPower opened 1 year ago

ralborodo-RatedPower commented 1 year ago

When converting a docx file (testDocument.docx) to PDF, the output file (testDocument-new.pdf ) has some overriding words.

In order to replicate the issue, here you have the code:

@Test
void simpletestconversion() {
    try(InputStream in = new FileInputStream(docPath);
        OutputStream out = new FileOutputStream(pdfPath)) {

        XWPFDocument document = new XWPFDocument(in);
        PdfOptions pdfOptions = PdfOptions.create();
        // Use a special font provider for chinese
        pdfOptions.fontProvider(CHINESE_FONT_PROVIDER);

        PdfConverter.getInstance().convert(document, out, pdfOptions);
    } catch(Exception e) {
        e.printStackTrace();
    }
}

with Chinese font provider defined as follow

private static final IFontProvider CHINESE_FONT_PROVIDER = (familyName, encoding, size, style, color) -> {
    try {
        BaseFont bf = BaseFont.createFont("/fonts/NotoSansCJK-Regular.ttc" + ",0", BaseFont.IDENTITY_H,
                                          BaseFont.NOT_EMBEDDED);
        Font font = new Font(bf, size, style, color);
        if(familyName != null) {
            font.setFamily(familyName);
        }
        return font;
    } catch(DocumentException | IOException e) {
        log.error("Font error", e);
        return ITextFontRegistry.getRegistry().getFont(familyName, encoding, size, style, color);
    }
};

and using the following dependencies in the pom file

...
 <apache-poi.version>5.2.3</apache-poi.version>
...
       <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi</artifactId>
            <version>5.2.3</version>
        </dependency>

        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml-full</artifactId>
            <version>5.2.3</version>
        </dependency>

        <dependency>
            <groupId>fr.opensagres.xdocreport</groupId>
            <artifactId>fr.opensagres.poi.xwpf.converter.pdf</artifactId>
            <version>2.0.4</version>
        </dependency>

Thank you so much for this amazing tool BTW :). I haven't found any related issues.

iu159 commented 1 year ago

facing the same issue, solved by word save as XML. then edit the XML <w:tblStyle w:val="1" /> mainly change table style value.

then save XML back to docx.

even this, the table is not perfect. consider use Aspose.

vauns commented 1 year ago

use fr.opensagres.xdocreport:fr.opensagres.xdocreport.converter.docx.docx4j instead but why i want to know and how to fix it mark