Handle non-ASCII data properly

There's a mistake in the handling of non-ASCII strings in the tokenizer. The 
xdot format tells us 
how many bytes long a string will be, but I hand that count to the substr 
function, which counts in 
characters, not bytes. I also wrongly named our variable "chars" instead of 
"bytes". (Actually the 
mistake was in the Graphviz documentation which said the xdot format counted 
characters; I 
submitted a patch to fix the documentation.)

None of the sample graphs exhibit the problem. You only see the problem if you 
have a single label 
which results in more than one text draw command, such as a multiline label, or 
a record or HTML-
like table. Here's an example:

digraph utf8 {
    a [label="ää\nb"]
}

Result in Canviz:

unknown token 14.000000

This was originally reported to me by email by Jan Wielemaker in November 2007 
and he provided a 
patch in his repository:

http://gollem.science.uva.nl/git/ClioPatria.git?
a=commitdiff;h=1669b252b25b6e75ced28be39b0449e9d13a62d3

I can't find any JavaScript string functions that work on bytes instead of 
characters so the method 
proposed in this patch seems to be the way to go.

Original issue reported on code.google.com by ryandesi...@gmail.com on 13 Oct 2008 at 4:31

shi3z / canviz

Handle non-ASCII data properly #17