relaxng / jing-trang

Schema validation and conversion based on RELAX NG
http://www.thaiopensource.com/relaxng/
Other
227 stars 69 forks source link

Support resolving a URI directly to a Schema object #34

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I suggest adding a SchemaResolver interface with a single method
public Schema resolveSchema(String systemId, PropertyMap options) throws
SAXException, IOException, IncorrectSchemaException;

NVDL and NRL createChildSchema() in SchemaReceiverImpl would use this
interface instead of EntityResolver if set in the property map.

I have two uses for this interface:
 1) It allows the application to maintain a cache of pre-parsed Schema
objects (at least when NVDL doesn't try to instantiate the schema with
different properties like making it an attribute schema).
 2) It allows the application to provide arbitrary Java-backed
implementations of the Schema interface. In Validator.nu things like HTML
table integrity checks are implemented in Java (since schema languages
can't express the constraints) but the checker is wrapped in a Jing Schema
instance, so it can be used as if it were a schema.

Original issue reported on code.google.com by hsivonen@iki.fi on 3 Nov 2008 at 11:44

GoogleCodeExporter commented 9 years ago
I forgot to mention that I've implemented this in the Validator.nu fork of 
oNVDL.

Original comment by hsivonen@iki.fi on 3 Nov 2008 at 11:45

GoogleCodeExporter commented 9 years ago
Let's consider 1) and 2) separately.

1) It seems like NVDL internally should have a schema cache, so that if a 
single NVDL
schema has multiple references to the same subschema, it can share a single 
copy of
the parsed schema (modulo your point about different properties).  
Application-level
caching is only going to be necessary if you have two separate NVDL schemas that
reference the same subschema, and each of the NVDL schemas will be used for
validating lots of documents.

2) Java-backed implementations of the schema interface make sense, but how do 
you
identify the Java code? Do you use a URL like java:com.example.MySchema? It 
might be
cleaner to have a little snippet of XML that gives the class name and the jars 
it
depends on. Then you can have a class loader than deals with loading up all the 
jars
and avoids problems with the user having to set their classpath or deal with 
Java
code that have dependencies with possibly incompatible versions. Or maybe the 
URL
points to the jar and there's a file in the jar that has the necessary metadata?

Original comment by jjc.jclark.com on 3 Nov 2008 at 1:00

GoogleCodeExporter commented 9 years ago
I already had an application-level Schema object cache in Validator.nu for all 
the
preset schemas, so I just wanted to expose that cache to NVDL as well.

For Java-backed implementations, I don't do any reflection. The URL is just 
treated
as a string key. The startup code that initializes the Schema object cache 
simply
stores a handful of special Schema objects with magic http URL keys along with 
the
Schema object created by actually reading schemas. In one case, resolving the 
URL
would actually yield an equivalent Schematron schema, but otherwise the URLs 
don't
resolve to anything useful and are just used for naming things in a Semantic Web
fashion. Since an untrusted user can enter any URL, I don't want to allow 
arbitrary
addressing into a Java classloader.

(Also, I have found that heavy class autodiscovery makes it harder to sandbox 
Java
processes. I couldn't figure out a way to use a security manager to limit the 
file
system access of the Validator.nu process and still make RELAX NG datatype 
library
autodiscovery work.)

Original comment by hsivonen@iki.fi on 3 Nov 2008 at 2:07

GoogleCodeExporter commented 9 years ago
Merged the Validator.nu implementation onto the validator-nu branch in r2174.

Original comment by hsivonen@iki.fi on 4 Nov 2008 at 1:02