Closed ljharb closed 9 years ago
Hi Jordan,
RE2's Scanner
has a slightly different interface to Ruby's String#scan
: in order to capture any matches, you need to use capturing groups in your regular expression:
RE2('(a)').scan('abca').to_a
#=> [["a"], ["a"]]
RE2('(ab?)').scan('abca').to_a
#=> [["ab"], ["a"]]
This is because the Scanner
actually wrap's re2's FindAndConsumeN
under the hood.
@mudge ok - so then, how can I use re2 to replicate the .gsub
interface, which return an enumerator, or take a block, or take a hash with replacements?
ie, there appears to be no way with re2 to enumerate all of the things that RE2.GlobalReplace
would replace, only the explicit capturing groups.
Unfortunately, I can't find an obvious analogue to Ruby's String#gsub
when given a block in re2's API.
The "Scanning text incrementally" section only covers Consume
and FindAndConsume
(which is implemented as the Scanner
in this gem) and the only replacement options seem to be Replace
and GlobalReplace
which operate on the whole input in one go.
Maybe we could find an alternative based on your use case? Do you need to do an incremental replacement on a large input?
What does RE.GlobalReplace
use under the hood with my RE2::RegExp
to locate matches for replacement? Could that be exposed at all?
I think that would be sufficient for me to implement all of hash-based, block-based, and enumerator-based substitution.
GlobalReplace
just uses the underlying re2 library's RE2::GlobalReplace
function. The underlying C++ API doesn't yield matches in any way: it just performs the replacement internally.
However, looking at the source shows that it is just using Match
and Rewrite
internally so perhaps there is a way to piece this together?
That would be awesome if there is a way :-) C++ isn't my strong suit tho, unfortunately
If I do
'abca'.scan(/a/).to_a
I get["a", "a"]
which is what I expect. However, if I doRE2::Regexp.new('a').scan('abca').to_a
, I get[[], []]
.Are my expectations wrong here? Or is this a bug?