Closed GoogleCodeExporter closed 9 years ago
I believe the code is correct as is, and that your change would make tests fail.
From your comment it sounds like your change DOES make tests fail.
Original comment by rsc@golang.org
on 29 Oct 2013 at 8:15
with
"prog_->MarkByteRange('\n', '\n');".
tests are all pass.
i am digging how to pass tests without marking any characters.
the \b should not require any byterange_ set.
Original comment by Lyricconch
on 29 Oct 2013 at 9:38
It is necessary to mark the boundaries so that (for example) bytemap_['@'] and
bytemap_['A'] are different, so that next_[bytemap_['@']] and
next_[bytemap_['A']] are different, so that if you compute next_[bytemap_['@']]
using @ as the next character, you do not attempt to reuse that work if you see
an A in the same state.
Original comment by rsc@golang.org
on 31 Oct 2013 at 6:51
[deleted comment]
[deleted comment]
the flag_ of DState is used to trading DState size with DState count.
on first look, it seems not good, a bit in flag_ will cause DState count
doubled,
but DState size is fixed while DState count can be cut by insts_,
with unused flag those states can be merged into one.
i think it may worth to move utf-8 handling out of DFA,
just like byterange_, we can collect meta information from regex compile time.
the InlinedSearchLoop just calculate those we care, and emit "Property of Rune"
instead bytemap_[c].
these cause "MB/s" much lower while DFA is full-state cached
(900MB/s is very nice, but should run regex enough time),
but should greatly improves DFA startup time and DFA memory usage(the "us/op").
(DFA full-cache requires bytemap_range_ * bytemap_range_ / 2 entitis per
Rune(support 2 bytes per Rune),
while "property" only require "kinds of exclusive property" entities per Rune)
Original comment by Lyricconch
on 2 Nov 2013 at 9:20
[deleted comment]
there is a kFlagLastWord in DState::flag_.
so that '@' or 'A' will come from different state according the prev char of
('@' or 'A').
with different state they come from, splitting them from bytemap_ is no longer
necessary.
the only exception is the first byte of DFA::RunOneByteOnState
(at this point, 'A' or '@' share the same start state).
Original comment by Lyricconch
on 2 Nov 2013 at 9:26
Original issue reported on code.google.com by
Lyricconch
on 29 Oct 2013 at 8:11