plutext / docx4j

JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
https://www.docx4java.org/
2.1k stars 1.2k forks source link

• #6

Open Dmole opened 12 years ago

Dmole commented 12 years ago

• is getting converted to •

(AbstractHtmlExporter.java:182)

conflicting with

(XMLEscapeUTF8.java:183) (XMLEscapeWriterUTF8.java:154) (XMLEscapeASCII.java:159) (XMLEscapeWriterASCII.java:154)

...

Dmole commented 12 years ago

or maybe (DOMSerializerEngine.java:183)

Dmole commented 12 years ago

related: There are also "c2a0" that should be "20" (might be related to [\n\r]+ vs \n )

Dmole commented 12 years ago

I commented out all of this to no effect: (XMLEscapeUTF8.java:183) (XMLEscapeWriterUTF8.java:154) (XMLEscapeASCII.java:159) (XMLEscapeWriterASCII.java:154) (DOMSerializerEngine.java:183)

Changing (AbstractHtmlExporter.java:182) has an effect, but that should be returning

anyway.
Dmole commented 12 years ago

avoiding github markdown: < ul>< li>stuff< /li>< /ul>

Dmole commented 12 years ago

(docx2xhtmlNG2.xslt:257) "At present, this doesn't use HTML OL|UL and LI; we'll do that when we have a document model to work from" ... I guess I'll do some post possessing, as I have no idea why it's not starting with a DOM.

Dmole commented 12 years ago

cat test.html | perl -p -e 's/&([a-z]{1,10};)/&$1/g;s/ / /g;s/[‘’“”]+/"/g'>test.2.html

plutext commented 12 years ago

I believe this issue is now fixed. See https://github.com/plutext/docx4j/commit/9c3ffd01d578891991fa333bbd9c558f41c620dc