opensagres / xdocreport

XDocReport means XML Document reporting. It's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).
https://github.com/opensagres/xdocreport
1.19k stars 368 forks source link

text is lost when word is transferred to pdf! #664

Open wshupengpeng opened 2 months ago

wshupengpeng commented 2 months ago

When I converted Word to PDF, I found that some of the text was missing

@Test
    public void wordConvertToPdf() throws IOException {
        String filePath = String.format("%s%s", BASE_DIR, "b0181116-8d64-464a-b4db-306acd067af3.docx");
        String outputPdfPath = String.format("%s%s", BASE_DIR, "1.pdf");
        // 缺失字体导致无法渲染正确的pdf
        wordToPdf(filePath, outputPdfPath);

    }

    public static void wordToPdf(String docPath,String pdfPath) {
        try(InputStream doc = new FileInputStream(docPath);
            XWPFDocument document= new XWPFDocument(doc);
            OutputStream out = new FileOutputStream(pdfPath)){
            setFontType(document);
            PdfOptions options = PdfOptions.create();
            options.fontProvider(CustomizeFontProvider.getInstance());
            PdfConverter.getInstance().convert(document, out, options);

        }catch (Exception e){
            log.error("wordToPdf failed ", e);
        }
    }

    private static void setFontType(XWPFDocument xwpfDocument) {
        //转换文档中文字字体
        List<XWPFParagraph> paragraphs = xwpfDocument.getParagraphs();
        if(paragraphs != null && paragraphs.size()>0){
            for (XWPFParagraph paragraph : paragraphs) {
                List<XWPFRun> runs = paragraph.getRuns();
                if(runs !=null && runs.size()>0){
                    for (XWPFRun run : runs) {
                        if(StringUtils.isEmpty(run.getColor())){
                            run.setColor("000000");
                        }
                    }
                }
            }
        }
        //转换表格里的字体 我也不想俄罗斯套娃但是不套真不能设置字体
        List<XWPFTable> tables = xwpfDocument.getTables();
        for (XWPFTable table : tables) {
            List<XWPFTableRow> rows = table.getRows();
            for (XWPFTableRow row : rows) {
                List<XWPFTableCell> tableCells = row.getTableCells();
                for (XWPFTableCell tableCell : tableCells) {
                    List<XWPFParagraph> paragraphs1 = tableCell.getParagraphs();
                    for (XWPFParagraph xwpfParagraph : paragraphs1) {
                        List<XWPFRun> runs = xwpfParagraph.getRuns();
                        for (XWPFRun run : runs) {
                            if(StringUtils.isEmpty(run.getColor())){
                                run.setColor("000000");
                            }
                        }
                    }
                }
            }
        }
    }

wordTemplate: word.docx convert pdf result: project__20f87ec11bad446a910610cf730eccc8.pdf

angelozerr commented 2 months ago

Any contribution are welcome!