DaisyDiff breaks with "There is no Atom with index" when strings have numbers

GoogleCodeExporter commented 8 years ago

When I run this piece of code, it breaks with 
"java.lang.IndexOutOfBoundsException: There is no Atom with index 3". I am 
using DaisyDiff 1.0 to compile and run this code.

<code>
        String contentOld = "festive 8";
        String contentNew = "What was your highlight of 8?";

        StringWriter writer = new StringWriter(contentNew.length() + 
contentOld.length());
        SAXTransformerFactory tf = (SAXTransformerFactory) 
TransformerFactory.newInstance();
        TransformerHandler result = tf.newTransformerHandler();
        result.setResult(new StreamResult(writer));
        ContentHandler postProcess = (ContentHandler) result;
        postProcess.startDocument();
        postProcess.startElement("", "diffreport", "diffreport", new 
AttributesImpl());
        postProcess.startElement("", "diff", "diff", new AttributesImpl());

        DaisyDiff.diffTag(contentOld, contentNew, postProcess);

        postProcess.endElement("", "diff", "diff");
        postProcess.endElement("", "diffreport", "diffreport");
        postProcess.endDocument();
        System.out.println(writer.getBuffer().toString());
</code>

Original issue reported on code.google.com by diptansh...@gmail.com on 11 Feb 2009 at 4:26

GoogleCodeExporter commented 8 years ago

Thanks for reporting.

The code works for me when I change the first lines to
        String contentOld = "<html><body>festive 8</body></html>";
        String contentNew = "<html><body>What was your highlight of 8?</body></html>";

Keep in mind that DaisyDiff was created to compare XHTML and that the behaviour 
for
other input is undefined.

Does this help you or do you believe DaisyDiff should support plain text?

Original comment by guy...@gmail.com on 17 Feb 2009 at 4:43

Changed state: WontFix

GoogleCodeExporter commented 8 years ago

This is not the ideal solution I was looking for. Padding texts with html tags 
should 
not be mandatory. 

Although DaisyDiff was created to compare XHTML, I think it should be extended 
to 
handle textual inputs too (just a suggestion). A classic case where this can 
come in 
handy is where you have two different stories to compare and you use this 
library 
from a custom tag to compare the author, created date, title, summary etc. of 
the two 
stories.

Original comment by diptansh...@gmail.com on 17 Feb 2009 at 5:09

GoogleCodeExporter commented 8 years ago

BTW, I got it to work using the following piece of code.

<code>
    String contentOld = "festive 8";
    String contentNew = "What was your highlight of 8?";
    getDiffForHTMLInput(contentOld, contentNew);

    private String getDiffForHTMLInput ( String contentOld, String contentNew ) 
throws Exception
    {
        contentOld = contentOld == null ? "" : contentOld;
        contentNew = contentNew == null ? "" : contentNew;
        StringWriter writer = new StringWriter(contentNew.length() + 
contentOld.length());
        List styleList = new ArrayList();
        styleList.add("/static/js/difftag/css/difftag.css");

        SAXTransformerFactory tf = (SAXTransformerFactory) 
TransformerFactory.newInstance();
        TransformerHandler result = tf.newTransformerHandler();
        result.setResult(new StreamResult(writer));

        ContentHandler postProcess = result;
        Locale locale = Locale.getDefault();
        String prefix = "diff";
        HtmlCleaner cleaner = new HtmlCleaner();
        InputSource oldSource = new InputSource(new 
ByteArrayInputStream(contentOld.getBytes("UTF-8")));
        InputSource newSource = new InputSource(new 
ByteArrayInputStream(contentNew.getBytes("UTF-8")));
        DomTreeBuilder oldHandler = new DomTreeBuilder();
        cleaner.cleanAndParse(oldSource, oldHandler);
        TextNodeComparator leftComparator = new TextNodeComparator(oldHandler, 
locale);
        DomTreeBuilder newHandler = new DomTreeBuilder();
        cleaner.cleanAndParse(newSource, newHandler);
        TextNodeComparator rightComparator = new TextNodeComparator(newHandler, 
locale);
        postProcess.startDocument();
        postProcess.startElement("", "diffreport", "diffreport", new 
AttributesImpl());
        attachStyleSheets(styleList, postProcess);
        postProcess.startElement("", "diff", "diff", new AttributesImpl());
        HtmlSaxDiffOutput output = new HtmlSaxDiffOutput(postProcess, prefix);
        HTMLDiffer differ = new HTMLDiffer(output);
        differ.diff(leftComparator, rightComparator);
        postProcess.endElement("", "diff", "diff");
        postProcess.endElement("", "diffreport", "diffreport");
        postProcess.endDocument();
        return writer.getBuffer().toString();
    }

    private void attachStyleSheets ( List styles, ContentHandler handler ) throws 
SAXException
    {
        handler.startElement("", "css", "css", new AttributesImpl());
        for (Iterator i = styles.iterator(); i.hasNext(); handler.endElement("", 
"link", "link"))
        {
            String cssLink = (String) i.next();
            AttributesImpl attr = new AttributesImpl();
            attr.addAttribute("", "href", "href", "CDATA", cssLink);
            attr.addAttribute("", "type", "type", "CDATA", "text/css");
            attr.addAttribute("", "rel", "rel", "CDATA", "stylesheet");
            handler.startElement("", "link", "link", attr);
        }
    }

</code>

Original comment by diptansh...@gmail.com on 17 Feb 2009 at 5:12

GoogleCodeExporter commented 8 years ago

Btw, if you want to diff plain text then I can recommend
http://code.google.com/p/google-diff-match-patch/ . If you have the time I'd be 
very
happy to accept patches that add support for plain text.

Valid XML should have a single root element (like <html>) so anything else is 
not
valid input and you should indeed wrap your snippets.

Original comment by guy...@gmail.com on 17 Feb 2009 at 5:31

seanshou / daisydiff

DaisyDiff breaks with "There is no Atom with index" when strings have numbers #7