Open NcIgor opened 2 years ago
When I run code to covert html to Doc (like in org.docx4j.samples.ConvertInXHTMLFile) I get a document with extra spaces and paragraphs F.e., my html:
<!DOCTYPE html> <html> <head> <style> i { color: red; background-color: gray; } </style> </head> <body> <div> some text <span>new text</span> </div> </body> </html>
Document:
Source code:
public static void main(String[] args) throws Exception { // org.docx4j.samples.ConvertInXHTMLFile String baseURL = null; String stringFromFile = getContent(); /*RFonts rfonts = Context.getWmlObjectFactory().createRFonts(); rfonts.setAscii("Century Gothic"); XHTMLImporterImpl.addFontMapping("Century Gothic", rfonts);*/ WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage(); NumberingDefinitionsPart ndp = new NumberingDefinitionsPart(); wordMLPackage.getMainDocumentPart().addTargetPart(ndp); ndp.unmarshalDefaultNumbering(); XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage); XHTMLImporter.setHyperlinkStyle("Hyperlink"); List<Object> convert = XHTMLImporter.convert(stringFromFile, baseURL); wordMLPackage.getMainDocumentPart().getContent().addAll(convert); System.out.println(XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(), true, true)); wordMLPackage.save(new File("docs/a.docx")); }
<dependency> <groupId>org.docx4j</groupId> <artifactId>docx4j-ImportXHTML</artifactId> <version>8.3.2</version> </dependency>
What does your getContent() do?
Can't reproduce, using ConvertInXHTMLFile sample code, which uses:
String stringFromFile = FileUtils.readFileToString(new File(inputfilepath), "UTF-8");
When I run code to covert html to Doc (like in org.docx4j.samples.ConvertInXHTMLFile) I get a document with extra spaces and paragraphs F.e., my html:
Document:
Source code: