plutext / docx4j

JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
https://www.docx4java.org/
2.11k stars 1.2k forks source link

Ignoring some table while convert from docx to html #150

Open deepakdabi opened 9 years ago

deepakdabi commented 9 years ago

I did doc'x to html conversion using V 3.2.1. It's ignoring some docx table in output html file. The same tables are converting file with the old version of docx4j lib V 2.8.1. The tables its ignoring having rowspan 2 in table header and few column having sub header. Attached example table screen shot.

tblex

plutext commented 9 years ago

Need a sample docx exhibiting the issue please.

deepakdabi commented 9 years ago

sent details on the mail. let me know in case u what that doc to put any specific location. thx

deepakdabi commented 9 years ago

Hi, Below is the code i am using and attached both input and o/p file. Thanks in advance,Deepak dabi

WordprocessingMLPackage wordMLPackage = Docx4J.load(in);                 HTMLSettings htmlSettings = Docx4J.createHTMLSettings();                 //htmlSettings.setImageDirPath( System.getProperty("user.dir") + uploadedImagesDirectory );                 htmlSettings.setWmlPackage(wordMLPackage);                 Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);                 //Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);                 Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);

 On Thursday, 23 July 2015 5:04 PM, Jason Harrop <notifications@github.com> wrote:

Need a sample docx exhibiting the issue please.— Reply to this email directly or view it on GitHub.

njskalski commented 9 years ago

I encountered similar issue in my program, and condition causing it was no atrtibutes in w:gridCol elements. Once I changed something like this:

<w:gridCol/><w:gridCol/>

into this:

<w:gridCol w:w="4492"/><w:gridCol w:w="4492"/>

it started working. Hope it helps.

deepakdabi commented 9 years ago

Oh you mean you have changes it into Html file ? if yes; we have a lot of files we can not change this in each file ; is there any other way to handel this using code into docx4j.?

njskalski commented 9 years ago

this was a portion of my docx, not html.

if the attribute w:w is present, the table is properly converted. If w:w is missing in w:gridCol in original docx file, the entire table is also missing from the resulting html.

I add these attributes on-the fly to workaround the docx4j problem.

deepakdabi commented 9 years ago

Can you tell me where and how i can add this ?

njskalski commented 9 years ago

If your input file is docx, just open it with any zip library, and modify document.xml with any XML transform library to add the mentioned attributes to elements. Then just save the modified document.xml into the docx (you can do it in-memory) and open it with docx4j.

In one of my use cases I don't use docx file as such, but a OpcPackage representation instead. But I guess that either way should work.