sshyran / genxdm

Automatically exported from code.google.com/p/genxdm
0 stars 0 forks source link

Fragment and Sequence builders must conform to new contract for namespace well-formedness #74

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
The structure of interface inheritance and implementation in GenXDM leads to a 
problem: ContentHandler is the base interface (consequently, the 
documentation/contract site) for the builders.  This is a problem because the 
contract for a *handler* is necessarily less rigorous than the contract for a 
*builder*.

A *builder* is required to create a well-formed tree (note, however, that there 
are some tests that may be considered too onerous to perform regularly, such as 
name or text-content checks).

The most notable difference in handling has to do with namespaces. A 
ContentHandler should simply die in flames if it is handed something namespace 
ill-formed.  A FragmentBuilder should clearly fix the problems.

Possible problems: an element in a namespace with no declaration. This is easy 
to fix.  Or: a namespace event (method call) after the first attribute event.  
We should change the contract for *builders* to insist that builders will 
reorder these as needed.  Or: an attribute in a namespace that has not been 
declared.  If we are deferring attribute events until we're sure all the 
namespaces have been fired, we can handle this at the same time.  Or: a 
namespace event after a child node event.  We should permit an exception to be 
thrown, I think (implicitly, the start tag has "been written" in this case, and 
allowing namespaces to queue after child nodes starts raising questions of *how 
much* deferral can take place before we blow the stack).

Currently, the Cx and DOM bridges are known to have problems. The Axiom bridge 
might actually be handling this stuff right, more or less (given its known 
problems with namespaces).  This issue will be closed when the DOM and Cx 
FragmentBuilders and SequenceBuilders behave correctly, and there is an 
additional test for at least FragmentBuilder (there aren't any actual tests for 
the typed API yet, and many bridge implementations have SequenceBuilder extend 
FragmentBuilder anyway).

Original issue reported on code.google.com by aale...@gmail.com on 12 Jan 2012 at 8:48

GoogleCodeExporter commented 8 years ago
Okay. Here's the relevant constraints on the data model.  We can regard this as 
"direct construction", perhaps, or if it's hitting the builder from a parser, 
then it's from-infoset (and validation is presumably from-psvi).  Regardless, 
the constraints always hold.

XQuery Data Model, 6.2 Element Nodes, 6.2.1 Overview, ordered list item #12:

For every expanded QName that appears in the dm:node-name of the element, the 
dm:node-name of any Attribute Node among the attributes of the element, or in 
any value of type xs:QName or xs:NOTATION (or any type derived from those 
types) that appears in the typed-value of the element or the typed-value of any 
of its attributes, if the expanded QName has a non-empty URI, then there must 
be a prefix binding for this URI among the namespaces of this Element Node.

If any of the expanded QNames has an empty URI, then there must not be any 
binding among the namespaces of this Element Node which binds the empty prefix 
to a URI.

endquote

From this, we can confidently state: an unbound namespace must either cause an 
exception to be thrown, or must trigger some form of "namespace fixup".

ContentHandler (the base interface in question here; SequenceHandler and the 
builders layer on top of this) is a streaming interface.  It's actually the 
event handler for a sequential messaging interface, a sink for which the source 
is undefined (model.stream() and cursor.write() are potentially sources, but so 
are parsers and validators, effectively).

It's easy enough to know to throw an IllegalStateException when, for instance, 
an attribute, namespace, or non-whitespace text event follows a document event 
(although arguably the latter is not an error, for the XDM, since its 
'document' node type can represent an XML entity, which might simply be a text 
block).

The complication that arises is that a ContentHandler may or may not be able to 
handle fixups or exceptions "out of order".  For instance, suppose that an 
element event has a node-name with the default prefix bound to the default 
namespace uri, but one of the subsequent namespace events contains a binding of 
a non-default uri to the default prefix.  It could reasonably throw the 
exception for the namespace event.  But, for more complexity, suppose that the 
element name uses a non-default prefix bound to a non-default uri, which is 
declared in its namespaces property (that is, a subsequent namespace event), 
but then there is an attribute (attribute event) which supplies the same prefix 
hint but a different uri.  For qnames in content, the problem is potentially 
even worse, of course.

First consequence: it appears that we may need a method on ContentHandler which 
responds to the question: how is namespace fixup handled?  Choices are throwing 
an exception versus fixup, but there's another option, of "expecting 
well-formedness"--that is, not even checking.  Checking/not checking and 
fixup/fail.  Do we need to do something like this?  If you *know* that your 
handler is going to receive well-formed XML (because there's something like an 
XML parser making sure that that is true), then a lot of complexity (and 
potentially expensive state-keeping) can be avoided.  If not, then perhaps you 
*ought* to check.

How you handle checking is also an interesting question.  For instance, we 
*could* provide a "NamespaceFixupHandler" in bridgekit, which would be 
instantiated with a non-checking downstream Handler.  This would mean that the 
code could be written generically, and each bridge that used it could rely upon 
it being correct input to the FragmentBuilder.  Or ProcessingContext's 
newFragmentBuilder could accept a parameter, ContentHandler filter.  Oh, hmmmm. 
 Brainstorming there, but that seems rather powerful.

Original comment by aale...@gmail.com on 17 Jan 2012 at 6:10

GoogleCodeExporter commented 8 years ago
resolved. not entirely certain that this is a correct resolution, but the core 
requirements seem addressed.

Original comment by aale...@gmail.com on 7 Feb 2012 at 4:02