Parsing trouble when using SAXParser from Oracle jar

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. Create a simple test and docx template to be processed
2. Set org.xml.sax.driver to oracle.xml.parser.v2.SAXParser
3. Add xmlparserv2.jar from OC4J to classpath

What is the expected output? What do you see instead?
Test runs fine but when i try to open the processed document it throws an error 
an says that file is corrupted.

What version of the product are you using? On what operating system?
I am using version 0.9.8 of xdocreports, Java 1.6 (no matter, it happens too 
with hava 1.7) and running inside a web application deployed in a OC4J 10.1.3 
application server that has been configured to use xmlparser from oracle 
library.

Please provide any additional information below.

Original issue reported on code.google.com by correa.j...@gmail.com on 18 Sep 2012 at 2:36

GoogleCodeExporter commented 8 years ago

Hi correa,

Coudl you please attach a sample with docx, java code and xmlparserv2.jar in 
this issue which causes the problem.

Many thank's

Regards Angelo

Original comment by angelo.z...@gmail.com on 18 Sep 2012 at 2:47

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Hi Angelo. Sure... 
Im attaching an eclipse project ready to run. It comes with a docx sample that 
is generated to D: drive.
Additionally i am attaching the xmlparserv2.jar.

The doc processing and generations works fine when java resolves the SAX parser 
against a Xerces implementation, no matter if its packaged as 
com.sun.org.apache.xerces.internal.parsers.SAXParser or 
org.apache.xerces.parsers.SAXParser

Regards

Johannes

Original comment by correa.j...@gmail.com on 18 Sep 2012 at 3:03

Attachments:

GoogleCodeExporter commented 8 years ago

Many thank's to attach your file. I will see your problem as soon as I will 
have time.
Coudl you please send us the stack trace with your SAX problem please.

Many thank's 

Regards Angelo

Original comment by angelo.z...@gmail.com on 18 Sep 2012 at 3:23

GoogleCodeExporter commented 8 years ago

Hi Angelo. There is no stack trace. What i mean is that XDocReports seems to 
work fine in the environment mentioned above, the test runs without any 
problem. The problem appears just when you go to open the generated document.

Regards

Johannes

Original comment by correa.j...@gmail.com on 18 Sep 2012 at 3:33

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

I implemented a temporal hack forcing XdocReports to use Xerces library by 
doing the following every time it uses a SAXParser:

XMLReader xmlReader = 
XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");

I see this hack could be risky according to javadocs, because this can lead to 
security exceptions. 

We should think the reported problem in a generic way: how does xdocreports 
behave when the virtual machine has been forced to use a specific sax parser 
different to Xerces?

Original comment by correa.j...@gmail.com on 18 Sep 2012 at 4:33

GoogleCodeExporter commented 8 years ago

Hi Correa,

I have fixed the problem on Git (will be available for 1.0.0).

The problem was the XML entries preprocessed by XDocReport was not well 
generated. With Oracle SAX Parser, the namespace was twice: 

--------------------------------------------------------------------------------
<w:document
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
...
    xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
...
--------------------------------------------------------------------------------

With Xerces Parser, namespaces are getted by  
org.xml.sax.ContentHandler#startPrefixMapping( String prefix, String uri )

With Oracle Parser, namespaces are getted by  
org.xml.sax.ContentHandler#startPrefixMapping( String prefix, String uri )but 
those namespaces are present to the SAXattributes list.

To fix this issue, I test if prefix was not generated to generate attributes 
list.  See doStartElement method of 
http://code.google.com/p/xdocreport/source/browse/document/fr.opensagres.xdocrep
ort.document/src/main/java/fr/opensagres/xdocreport/document/preprocessor/sax/Bu
fferedDocumentContentHandler.java

Tell me if this fix works for you.

Regards Angelo

Original comment by angelo.z...@gmail.com on 18 Sep 2012 at 4:48

GoogleCodeExporter commented 8 years ago

Hi Angelo, 

I downloaded the 1.0.0-SNAPSHOT and run the test again. It seems to work fine 
with both oracle and xerces parsers. What do you recommend us? Build the 
libraries based on 0.9.8 tag with this specific code or work with 1.0.0 
snapshot version?

Regards,

Johannes Correa

Original comment by correa.j...@gmail.com on 18 Sep 2012 at 6:04

GoogleCodeExporter commented 8 years ago

Hi Johannes,

Many thank's for your test. It's cool that this problem is fixed.
I suggest to work with 1.0.0 snapshot version because it improves a lot 
docx->pdf converter. HTML text styling will be improved too.

For teh stability of XDocReport I think we have a lot of JUnits which avoid 
having regressions. Those Junit are launched as soon as we commit on Git. 

Regards Angelo

Original comment by angelo.z...@gmail.com on 18 Sep 2012 at 7:03

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Hi Angelo. Should be feasible to replicate this fix to tag 0.9.8? We are facing 
some troubles with the migration to JDK 1.6 because we are running under JDK 
1.5 and there is a new dependency with a class java.util.ServiceLoader that is 
new in JDK 1.6.

Original comment by correa.j...@gmail.com on 19 Sep 2012 at 4:13

GoogleCodeExporter commented 8 years ago

Hi Johannes,

You told me that : 
> I am using version 0.9.8 of xdocreports, Java 1.6 (no matter, it happens too 
with Java 1.7)

At first I would like avoid modifying our old tag because XDocRepport 0.9.8 are 
deployed on central maven repository. 

We have decided to migrate to Java 5 according our discussion "Can XDocReport 
depend today on Java6 instead of Java5?" (see 
https://groups.google.com/forum/?hl=fr&fromgroups=#!topic/xdocreport/4DNLRyrEemo
). You can post your comments about this topic if you wish.

I was afraid that some people (like you) uses Java5. 

@Pascal : I'm not sure it's a good idea to migrate to Java6 just for using 
java.util.ServiceLoader

I'm investigating to see how to manage Java5/Java6 ServiceLoader.

Original comment by angelo.z...@gmail.com on 19 Sep 2012 at 10:50

GoogleCodeExporter commented 8 years ago

Hi Angelo,

My mistake. Our real environment is Java5 based, using XDocReport in a OC4J 
application server. Given i am new into the company, my local environment was 
quite different (Java6 as runtime).
Right now, my approach is a local compilation and build of XDocReports merging 
the 0.9.8 tag with the fix you implemented. I know its not a desirable approach 
but works to fix the existing restriction of our client to use in their 
environment the Oracle SAX Parser.

About how to manage Java5/Java6 ServiceLoader, i did a brief research (googled 
backport java.util.ServiceLoader) and found that some people did a backport. I 
don't know if that would be an option to you... i still think is not desirable 
unless it was a "official" backport.

Original comment by correa.j...@gmail.com on 20 Sep 2012 at 1:23

GoogleCodeExporter commented 8 years ago

Hi Johannes, 

Have you read my previous message? Now the 1.0.0 manages JDK6+JDK5 
ServiceLoader with reflection :

----------------------------------------------------------------
Finnaly I think we will switch to Java5.
I have commited the Java code to manage JDK6 and JDK5 service loader with this 
class 
http://code.google.com/p/xdocreport/source/browse/core/fr.opensagres.xdocreport.
core/src/main/java/fr/opensagres/xdocreport/core/internal/JDKServiceLoader.java

Could you please tell me if it's OK with you?
Pascal will change the maven pom.xml to switch to Java5.
----------------------------------------------------------------

Regards Angelo

Original comment by angelo.z...@gmail.com on 20 Sep 2012 at 1:30

GoogleCodeExporter commented 8 years ago

No, I had not read it... nice solution... im going to test it ASAP

Original comment by correa.j...@gmail.com on 20 Sep 2012 at 2:07

vs72737 / xdocreport

Parsing trouble when using SAXParser from Oracle jar #150