Underlying bridge can be in a non-normalized state - we should document how bridges should perform, and then make bridges do that

GoogleCodeExporter commented 8 years ago

This came up in the context of a question related to the XML Security port. The 
underlying DOM tree can apparently have elements added with namespace, local 
name, and prefix, but also without the corresponding namespace declaration 
attribute.

A complete list of what DOM does for normalizeDocument in Java can be found 
here:
http://docs.oracle.com/javase/6/docs/api/org/w3c/dom/DOMConfiguration.html

Running through the options, GenXDM may have a stake in some of them:

cdata-sections: Since GenXDM expects text nodes combined, and doesn't speak of 
cdata-sections, this is moot.

check-character-normalization

comments: Keep or remove?

element-content-whitespace: How much whitespace do we keep?

namespace-declarations: This is the one that tripped up the security question.

well-formed: contents and names might be invalid according to XML 1.0 - do we 
care?

We should figure out how we expect this to behave, or leave it as undefined 
(and indicate so). Options:

 * Define that an individual bridge can determine what it does in the face of a non-normalized document (i.e. - API can punt on the question - DOM bridge could specify that clients must call Document.normalizeDocument to get predictable results.

 * Define that a bridge must manufacture namespace nodes as needed, just like we've indicated how text nodes should be handled.

...

This might work best as a tracking issue, with specific issues raised and 
depended upon as we decide them. Low priority, though, I think.

Original issue reported on code.google.com by eric%tib...@gtempaccount.com on 3 Jan 2012 at 12:50

GoogleCodeExporter commented 8 years ago

Okay. I'm opening a new issue for namespaces and attributes in the context of 
tree construction; leave those out of this discussion.

cdata-sections are moot. character normalization is moot in the context of a 
tree in memory; it matters for serialization, and needs to be addressed in the 
context of the implementation of XQuery serialization.  The same is true for 
whitespace.

The only way that content could be ill-formed is by containing characters that 
are not legal in XML 1.0 (C0 and C1, with exceptions).  This is potentially a 
problem during construction or mutation; I would regard it as the most 
worrisome of issues raised.

It's far easier to be ill-formed with respect to names (and rules differ for 
XML 1.0 versus 1.1).  It's also *far* more expensive to check.  *Should* we 
provide protection, at the (well-known) cost of performance hit?  Keep in mind 
that most names in any given document are used multiple times, but not 
necessarily interned, so that we might end up checking the same characters over 
and over and over again.  This is the sort of protection-from-folly that ought 
to be configurable, clearly (I'm not going to write code that breaks XML so 
comprehensively, and am liable to resent paying the price to check every 
character in every name for every use of that name).  Also worth noting: QName 
does not prevent illegal values (apart from null) (in fact, we make use of that 
fact, using ESC as a wildcard indicator).

Original comment by aale...@gmail.com on 12 Jan 2012 at 8:36

GoogleCodeExporter commented 8 years ago

I don't think we necessarily provide protection for any of this. At least at 
the level of the generic API, we might leave it undefined. For a particular 
bridge, if we find utility in adding the functionality, we can add it and 
document it as bridge specific capability.

Original comment by one.eric...@gmail.com on 12 Jan 2012 at 8:42

GoogleCodeExporter commented 8 years ago

Resolution of these questions must happen prior to 1.0

Original comment by aale...@gmail.com on 26 Jul 2012 at 7:32

Added labels: Milestone-Release1.0

GoogleCodeExporter commented 8 years ago

This was mentioned in a discussion of the copyNamespaces bodge on 
Model.stream().

One possibility raised: (re-)introduce a 'normalize' method to the mutable API. 
This would either appear on both MutableModel and MutableCursor, or on 
MutableContext.

Original comment by aale...@gmail.com on 2 Aug 2012 at 6:18

GoogleCodeExporter commented 8 years ago

A point to add here:

an instance built via FragmentBuilder can't be in a non-normalized state. The 
source of screwiness is the mutable API (only).

Therefore, the fix (if any) needs to be on the mutable API. As an interim 
approach, we can document each method of the mutable API to indicate "this can 
make your tree into something that doesn't qualify as an XQuery Data Model, and 
may not qualify as XML."

Original comment by aale...@gmail.com on 9 Aug 2012 at 3:27

GoogleCodeExporter commented 8 years ago

Deferred.

Original comment by aale...@gmail.com on 24 Oct 2013 at 5:20

Added labels: Milestone-Future
Removed labels: Milestone-Release1.0

GoogleCodeExporter commented 8 years ago

Changed owner.

Original comment by eric%tib...@gtempaccount.com on 6 May 2014 at 9:44

sshyran / genxdm

Underlying bridge can be in a non-normalized state - we should document how bridges should perform, and then make bridges do that #73