mvolkmann / waxy

WAX - a new approach to writing XML
http://java.ociweb.com/mark/programming/wax.html
2 stars 0 forks source link

Quote escaping #38

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
There is no need to escape single or double quotes outside attribute
values, for example in text(). Also, if attribute values always wrapped in
double quotes, then only those have to be escaped in the values themselves.

Original issue reported on code.google.com by manosbat...@gmail.com on 25 Sep 2008 at 3:11

GoogleCodeExporter commented 9 years ago
I'm debating whether it's worthwhile to do anything about this. If I understand 
correctly, we don't need to do the 
following escapes.

> in attribute or text values (only in CDATA sections)
' in double quoted attributes or any text
" in single quoted attributes or any text

Should WAX avoid all of these escapes?

Original comment by r.mark.v...@gmail.com on 29 Sep 2008 at 4:14

GoogleCodeExporter commented 9 years ago
I don't have a strong opinion about it.

I like the readability of ' and " instead of ' and " in the resulting XML.
 But I like the (admittedly unnecessary) symmetry of < (required) and > (optional).

It's subjective.  I could go either way.

Original comment by jeffgr...@charter.net on 2 Oct 2008 at 2:09

GoogleCodeExporter commented 9 years ago
Does anybody know what is technically correct here? I'd like to do whatever the 
XML recommendation says we 
should do. Is what we are currently doing ... escaping all five special 
characters all the time ... considered wrong?

Original comment by r.mark.v...@gmail.com on 4 Oct 2008 at 4:02

GoogleCodeExporter commented 9 years ago
Hey, the issue is marked as "Enhancement" ;-)

I agree with Jeff, i.e. I like plain quotes in text and the symmetry escaping 
both <
and >.

Original comment by manosbat...@gmail.com on 4 Oct 2008 at 4:13

GoogleCodeExporter commented 9 years ago
I added a test illustrating the one case where '>' quoting is needed.  WAX is 
currently doing the correct quoting.

For guidance, I'm looking at the "2.4 Character Data and Markup" section of 
this 
document:
http://www.w3.org/TR/2008/PER-xml-20080205/#syntax

I think we need this additional test, as a restriction on valid CDATA content:
(You can't have the sequence "]]>" in CDATA.)

    @Test
    public void testCDATAContainingCDATASectionCloseDelimiter() {
        StringWriter sw = new StringWriter();
        WAX wax = new WAX(sw);
        wax.start("root");
        try {
            wax.cdata("==]]>==");
            fail("Expected IllegalArgumentException.");
        } catch (final IllegalArgumentException expectedIllegalArgumentException) {
            assertEquals(
                    "CDATA section data must not contain the CDATA section close 
delimiter, ']]>'.",
                    expectedIllegalArgumentException.getMessage());
        }
    }

Original comment by jeffgr...@charter.net on 5 Oct 2008 at 3:27

GoogleCodeExporter commented 9 years ago
[  Lightbulb pops up over head!  ;->  ]

Or we could just make it work, for the users, instead of restricting/preventing.

That is, instead of doing this in WAX...

        if (text.indexOf("]]>") > -1)
            throw new IllegalArgumentException("CDATA section data must not contain 
the CDATA section close delimiter, ']]>'.");

We could do this:

        text("<![CDATA[" + text.replaceAll(Pattern.quote("]]>"), "]]]]><![CDATA[>") 
+ "]]>", newLine);

With this test:

    @Test
    public void testCDATAContainingCDATASectionCloseDelimiter_Supported() throws 
Exception {
        final StringWriter sw = new StringWriter();
        WAX wax = new WAX(sw);
        wax.start("root").cdata("==]]>==").close();

        final String xmlString = sw.toString();
        assertEquals("<root><![CDATA[==]]]]><![CDATA[>==]]></root>", xmlString);

        final Document doc = parseXml(xmlString);
        doc.normalize();
        final Element rootElement = doc.getDocumentElement();
        assertEquals("root", rootElement.getNodeName());
        assertEquals("==]]>==", rootElement.getTextContent());
    }

Original comment by jeffgr...@charter.net on 5 Oct 2008 at 3:35

GoogleCodeExporter commented 9 years ago
Tests for simplified quoting, should we want to implement this feature at some 
point...

(The first of the three tests passes today; it just illustrates required '"' 
quoting in attributes.)

    @Test
    public void testAttributeWithSingleQuoteCharacter() throws Exception {
        StringWriter sw = new StringWriter();
        WAX wax = new WAX(sw);
        final String atributeValue = "Bill \"The Man\" Bates";
        wax.start("root").attr("a", atributeValue).close();
        assertEquals("<root a=\"Bill "The Man" Bates\"/>", sw
                .toString());

        final Document doc = parseXml(sw.toString());
        final Element rootElement = doc.getDocumentElement();
        assertEquals(atributeValue, rootElement.getAttribute("a"));
    }

    @Test
    public void testAttributeWithDoubleQuoteCharacter() throws Exception {
        StringWriter sw = new StringWriter();
        WAX wax = new WAX(sw);
        final String atributeValue = "Bill O'Riley";
        wax.start("root").attr("a", atributeValue).close();
        assertEquals("<root a=\"" + atributeValue + "\"/>", sw.toString());

        // final Document doc = parseXml("<root a=\"Bill O'Riley\"/>");
        final Document doc = parseXml(sw.toString());
        final Element rootElement = doc.getDocumentElement();
        assertEquals(atributeValue, rootElement.getAttribute("a"));
    }

    @Test
    public void testTextWithQuoteCharacters() throws Exception {
        StringWriter sw = new StringWriter();
        WAX wax = new WAX(sw);
        final String unquotedTextValue = "Bill \"The Man\" O'Riley";
        wax.start("root").attr("a", unquotedTextValue).close();
        assertEquals("<root>" + unquotedTextValue + "</root>", sw.toString());

        // final Document doc = parseXml("<root>"+unquotedTextValue+"</root>");
        final Document doc = parseXml(sw.toString());
        final Element rootElement = doc.getDocumentElement();
        assertEquals(unquotedTextValue, rootElement.getTextContent());
    }

Original comment by jeffgr...@charter.net on 5 Oct 2008 at 4:18

GoogleCodeExporter commented 9 years ago
Best I can tell, the current WAX implementation is correct, as implemented.

So this issue is correctly categorized as an enhancement, not a bug.

It's a style issue.  Would the resulting XML look better with less & 
quoting?  IE:  Would it be more readable, ...by humans?

I say, don't stress on it for the 1.0 release.

The down side of this enhancement, is that we'd need two different quoting 
methods:  One for text, the other for attributes.  But that's the essence of 
the enhancement, so I wouldn't really consider it an issue.

Original comment by jeffgr...@charter.net on 5 Oct 2008 at 4:21