plutext / docx4j-ImportXHTML

Converts XHTML to OpenXML WordML (docx) using docx4j
136 stars 125 forks source link

Convert Html to Docx : Empty Paragraphs #79

Open NcIgor opened 2 years ago

NcIgor commented 2 years ago

When I run code to covert html to Doc (like in org.docx4j.samples.ConvertInXHTMLFile) I get a document with extra spaces and paragraphs F.e., my html:

<!DOCTYPE html>
<html>
<head>
    <style>
        i {
            color: red;
            background-color: gray;
        }
    </style>
</head>
<body>
<div>
    some text
    <span>new text</span>
</div>
</body>
</html>

Document: image

Source code:

    public static void main(String[] args) throws Exception {
//        org.docx4j.samples.ConvertInXHTMLFile
        String baseURL = null;
        String stringFromFile = getContent();
        /*RFonts rfonts = Context.getWmlObjectFactory().createRFonts();
        rfonts.setAscii("Century Gothic");
        XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);*/
        WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
        NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
        wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
        ndp.unmarshalDefaultNumbering();
        XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
        XHTMLImporter.setHyperlinkStyle("Hyperlink");
        List<Object> convert = XHTMLImporter.convert(stringFromFile, baseURL);
        wordMLPackage.getMainDocumentPart().getContent().addAll(convert);
        System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true));
        wordMLPackage.save(new File("docs/a.docx"));
    }
        <dependency>
            <groupId>org.docx4j</groupId>
            <artifactId>docx4j-ImportXHTML</artifactId>
            <version>8.3.2</version>
        </dependency>
plutext commented 2 years ago

What does your getContent() do?

Can't reproduce, using ConvertInXHTMLFile sample code, which uses:

        String stringFromFile = FileUtils.readFileToString(new File(inputfilepath), "UTF-8");