Closed GoogleCodeExporter closed 9 years ago
Sweet! This looks pretty solid - thanks for the great work!
I applied the patch in a new branch 'utf'. I found a small bug when I tested it
in OS X (it worked in Windows
with no problems that I found) - when reading a Windows newline CR/LF, we
already have a mechanism for
figuring out how many characters that newline takes up (the RegEx::Match
returns the number of characters
matched, so we use that).
I realized that you added some extra testing in the Stream::get function to see
if it's reading a CR/LF newline
- but it's actually designed to read a CR, and then a LF. See
Scanner::ScanToNextToken for exactly where this
is used. Anyways, I just switched it back to returning one character at a time
and it works fine.
Also, it's slower than the original version. I added a little parse program
that just parses a document (see
util/parse) for testing, and I'm using the Duck.yaml file that someone sent in
Issue 7 (it's 328 KB). Compiling
in gcc with -O2, on my machine the trunk takes about .14s and the utf branch
takes about .38s. Granted, this
is a pretty big file and the times are pretty small anyways, but it's still
about 2.5x slower. I haven't profiled the
new version yet, but I will, and I'll see what I can do. If you want to play
with that also, that'd be cool.
Thanks again, this is a much needed addition to the library!
Original comment by jbe...@gmail.com
on 7 Jul 2009 at 9:59
I inlined a lot of the code (see the svn log for more details), which included
a substantial overhaul to the RegEx
class, but this sped up the utf branch by about 2x. I'll still work more on it,
but it's at the point where I'm happy
(enough) with its speed to merge soon.
Original comment by jbe...@gmail.com
on 9 Jul 2009 at 7:14
I played with it a little more to squeeze a little more speed out of it, and I
decided to merge it with the trunk.
FWIW, my speed tests show that it's somewhere between 5% slower (on my Linux
box) and 20% slower (on my
Mac). (Of course, these tests are only for simple ASCII files.)
Thanks again! By the way, if someone opens an issue relating to this aspect of
the code, would you be willing to
help fix it? I've read through all of your patch, but of course you know how it
works better than I do :)
Original comment by jbe...@gmail.com
on 10 Jul 2009 at 3:39
Aboslutely. You've got my email address, so just pop me a line. I even check
the
issues list here from time to time, so I might even see it and start working on
it.
I don't have any kind of automated notification system though, so it wouldn't
hurt to
forward any issue on to me in case I've forgotten to check.
Original comment by rtweeks21
on 10 Jul 2009 at 1:45
Original issue reported on code.google.com by
rtweeks21
on 3 Jul 2009 at 5:18Attachments: