hobama / dkpro-wsd

Automatically exported from code.google.com/p/dkpro-wsd
0 stars 0 forks source link

SAXReader-based XML should try to find and process the DTD #43

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When SAXReader-based XML readers try to read an XML file which specifies a DTD, 
they fail because they can't find the DTD.

The problem can possibly be solved in *some* cases by using an EntityResolver 
which looks for the DTD in the same place as the XML file.  However, I am 
pretty sure this won't always work if the XML file is being read from the 
classpath, since it might be inside a JAR.

Readers should probably therefore include a configuration parameter for 
ignoring the DTD.  This would be implemented by an EntityResolver returning an 
empty InputSource.

Original issue reported on code.google.com by tristan.miller@nothingisreal.com on 6 Nov 2013 at 1:59

GoogleCodeExporter commented 9 years ago

Original comment by tristan.miller@nothingisreal.com on 6 Nov 2013 at 2:00

GoogleCodeExporter commented 9 years ago
I implemented a null EntityResolver and made all the XML readers use it.  This 
isn't an ideal solution as the XML readers will no longer issue a diagnostic 
for bad XML files.

The XML reader in de.tudarmstadt.ukp.dkpro.core.io.xml seems to use some other 
SAX-based method of handling XML which might not have this DTD problem.  We 
should study it and see if we can adapt the technique.

Original comment by tristan.miller@nothingisreal.com on 6 Nov 2013 at 2:51

GoogleCodeExporter commented 9 years ago

Original comment by tristan.miller@nothingisreal.com on 6 Nov 2013 at 2:53