mvolkmann / waxy

WAX - a new approach to writing XML
http://java.ociweb.com/mark/programming/wax.html
2 stars 0 forks source link

auto-escape and trust me need to be separated #20

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
We want the ability to turn on "trust me" mode for performance reasons after an 
application has 
been tested. Doing that should not change the output, including whether special 
characters are 
automatically escaped.

This was reported by Emmanouil Batsis.

Original issue reported on code.google.com by r.mark.v...@gmail.com on 15 Sep 2008 at 3:50

GoogleCodeExporter commented 9 years ago

After thinking about it, i really don't see any performance benefit for "trust 
me"
regarding context. Checking whether it is on or off eats up the same cycles as a
context check (e.g. IN_START_TAG). IMHO, those should always be used as i 
believe the
performance hit is minimal for such an essential feature. Performance can 
however be
improved by skipping NMTOKEN validations.

So my proposal is

===============================
escaping
-------------------------------
/** set the default escape behavior, initial is true */
WAXish setEscape(boolean);

/** get the default escape behavior */
boolean getEscape();

===============================
NMTOKEN validation
-------------------------------
/** set whether to validate NMTOKENs, initial is true */
WAXish setNmtokensChecked(boolean);

/** get whether NMTOKENs are validated */
boolean getNmtokensChecked();

Original comment by manosbat...@gmail.com on 17 Sep 2008 at 4:34

GoogleCodeExporter commented 9 years ago
I'm questioning my choice of "NMToken" in method names based on what I'm seeing 
in the W3C XML 
Recommendtation.

"[Definition: A Name is a token beginning with a letter or one of a few 
punctuation characters, and continuing 
with letters, digits, hyphens, underscores, colons, or full stops, together 
known as name characters.] Names 
beginning with the string "xml", or with any string which would match 
(('X'|'x') ('M'|'m') ('L'|'l')), are reserved for 
standardization in this or future versions of this specification.

An Nmtoken (name token) is any mixture of name characters."

So according to this I should use "Name" instead of "Nmtoken".

Original comment by r.mark.v...@gmail.com on 18 Sep 2008 at 10:14

GoogleCodeExporter commented 9 years ago
Manos, I think I agree with your proposal in comment 1. I've started 
implementing it. Based on my last 
comment we may want to make the last two methods be named setNameValidation and 
getNameValidation.

I just did an exhaustive check of the Java WAX class to see everything that 
trustMe was being used for. Here it 
is.
1) method call order (I agree that we can start doing this all the time since 
it's fast.)
2) name validation for elements, attributes, namespace prefixes and processing 
instruction targets
3) comment validation (can't have -- in a comment)
4) prefix in scope checking (can't use a prefix that isn't currently in scope)
5) URI syntax validation for DTDs, namespaces, schema paths and XSLT paths
6) sensible indent values (none, two spaces, four spaces or one tab; can use 
other values if trustMe = true

So which of these do we want to be able to disable for performance reasons?
I think it's at least 2, 3, 4 and 5.
I want to allow what I consider non-sensible indent values, but only if 
checking is disabled.

Original comment by r.mark.v...@gmail.com on 18 Sep 2008 at 10:31

GoogleCodeExporter commented 9 years ago
I'd vote for the trust switch to disable 2,3, 5 and 6 (then again, i consider 4:
in-scope prefixes a context thing like 1).

In any case, let me know your decision for me to implement likewise in WAX.js

Original comment by manosbat...@gmail.com on 19 Sep 2008 at 1:33

GoogleCodeExporter commented 9 years ago
Regarding indent, I think there is are no "meaningful" values. For example, 
I've seen
people use weird indentation in combination with xml:space to produce printable 
stuff.

Here's what i do for setIndent: in case of an int, then indent is int number of
spaces. In case it's a string, i use the string length as the int (I never 
validate
the string content as it's not used). WDYT?

Original comment by manosbat...@gmail.com on 19 Sep 2008 at 1:37

GoogleCodeExporter commented 9 years ago
I've had someone ask me to support any indent characters for debugging 
purposes. For example, someone 
wanted to use three periods so the indentation is "visible". My current plan is 
to allow that if they do this.

wax.setTrustMe(true);
wax.setIndent("...");
wax.setTrustMe(false); // to resume other checking

Original comment by r.mark.v...@gmail.com on 19 Sep 2008 at 9:30

GoogleCodeExporter commented 9 years ago
Done. Below is the javadoc comment on the setTrustMe method which explains when 
it now causes to be 
checked when set to false. It no longer affects whether element text and 
attribute values are escaped. That is 
now controlled by the setEscape method.

     * Sets whether "trust me" mode is enabled.
     * When disabled (the default), the following checks are made.
     * 1) element names, attribute names, namespace prefixes and
     *    processing instruction targets are verified to be valid XML names
     * 2) comments are verified to not contain "--"
     * 3) element and attribute prefixes are verified to be in scope
     * 4) DTD paths, namespaces, schema paths and XSLT paths
     *    are to use valid URI syntax
     * 5) only sensible indent values (none, two spaces, four spaces or one tab)
     *    are allowed (can use other values if trustMe = true)
     * The main reason to enable "trust me" mode is for performance
     * which is typically good even when disabled.

Original comment by r.mark.v...@gmail.com on 19 Sep 2008 at 10:22