Closed GoogleCodeExporter closed 9 years ago
After thinking about it, i really don't see any performance benefit for "trust
me"
regarding context. Checking whether it is on or off eats up the same cycles as a
context check (e.g. IN_START_TAG). IMHO, those should always be used as i
believe the
performance hit is minimal for such an essential feature. Performance can
however be
improved by skipping NMTOKEN validations.
So my proposal is
===============================
escaping
-------------------------------
/** set the default escape behavior, initial is true */
WAXish setEscape(boolean);
/** get the default escape behavior */
boolean getEscape();
===============================
NMTOKEN validation
-------------------------------
/** set whether to validate NMTOKENs, initial is true */
WAXish setNmtokensChecked(boolean);
/** get whether NMTOKENs are validated */
boolean getNmtokensChecked();
Original comment by manosbat...@gmail.com
on 17 Sep 2008 at 4:34
I'm questioning my choice of "NMToken" in method names based on what I'm seeing
in the W3C XML
Recommendtation.
"[Definition: A Name is a token beginning with a letter or one of a few
punctuation characters, and continuing
with letters, digits, hyphens, underscores, colons, or full stops, together
known as name characters.] Names
beginning with the string "xml", or with any string which would match
(('X'|'x') ('M'|'m') ('L'|'l')), are reserved for
standardization in this or future versions of this specification.
An Nmtoken (name token) is any mixture of name characters."
So according to this I should use "Name" instead of "Nmtoken".
Original comment by r.mark.v...@gmail.com
on 18 Sep 2008 at 10:14
Manos, I think I agree with your proposal in comment 1. I've started
implementing it. Based on my last
comment we may want to make the last two methods be named setNameValidation and
getNameValidation.
I just did an exhaustive check of the Java WAX class to see everything that
trustMe was being used for. Here it
is.
1) method call order (I agree that we can start doing this all the time since
it's fast.)
2) name validation for elements, attributes, namespace prefixes and processing
instruction targets
3) comment validation (can't have -- in a comment)
4) prefix in scope checking (can't use a prefix that isn't currently in scope)
5) URI syntax validation for DTDs, namespaces, schema paths and XSLT paths
6) sensible indent values (none, two spaces, four spaces or one tab; can use
other values if trustMe = true
So which of these do we want to be able to disable for performance reasons?
I think it's at least 2, 3, 4 and 5.
I want to allow what I consider non-sensible indent values, but only if
checking is disabled.
Original comment by r.mark.v...@gmail.com
on 18 Sep 2008 at 10:31
I'd vote for the trust switch to disable 2,3, 5 and 6 (then again, i consider 4:
in-scope prefixes a context thing like 1).
In any case, let me know your decision for me to implement likewise in WAX.js
Original comment by manosbat...@gmail.com
on 19 Sep 2008 at 1:33
Regarding indent, I think there is are no "meaningful" values. For example,
I've seen
people use weird indentation in combination with xml:space to produce printable
stuff.
Here's what i do for setIndent: in case of an int, then indent is int number of
spaces. In case it's a string, i use the string length as the int (I never
validate
the string content as it's not used). WDYT?
Original comment by manosbat...@gmail.com
on 19 Sep 2008 at 1:37
I've had someone ask me to support any indent characters for debugging
purposes. For example, someone
wanted to use three periods so the indentation is "visible". My current plan is
to allow that if they do this.
wax.setTrustMe(true);
wax.setIndent("...");
wax.setTrustMe(false); // to resume other checking
Original comment by r.mark.v...@gmail.com
on 19 Sep 2008 at 9:30
Done. Below is the javadoc comment on the setTrustMe method which explains when
it now causes to be
checked when set to false. It no longer affects whether element text and
attribute values are escaped. That is
now controlled by the setEscape method.
* Sets whether "trust me" mode is enabled.
* When disabled (the default), the following checks are made.
* 1) element names, attribute names, namespace prefixes and
* processing instruction targets are verified to be valid XML names
* 2) comments are verified to not contain "--"
* 3) element and attribute prefixes are verified to be in scope
* 4) DTD paths, namespaces, schema paths and XSLT paths
* are to use valid URI syntax
* 5) only sensible indent values (none, two spaces, four spaces or one tab)
* are allowed (can use other values if trustMe = true)
* The main reason to enable "trust me" mode is for performance
* which is typically good even when disabled.
Original comment by r.mark.v...@gmail.com
on 19 Sep 2008 at 10:22
Original issue reported on code.google.com by
r.mark.v...@gmail.com
on 15 Sep 2008 at 3:50