Implement begin and end methods on the MatchData - Githubissues

mudge / re2

Ruby bindings to RE2, a "fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python".

http://mudge.name/re2/

BSD 3-Clause "New" or "Revised" License

129 stars 13 forks source link

Implement begin and end methods on the MatchData #21

Closed driskell closed 9 years ago

driskell commented 9 years ago

Hi @mudge

Here's the begin() and end() implementation for the MatchData. Let me know if it's OK!

Jason

driskell commented 9 years ago

Sorry missed a commit off - now complete and fully tested.

mudge commented 9 years ago

This looks great, thanks.

As you mentioned in #20, I'm thinking how this could be used to refactor re2_matchdata_aref (and therefore re2_matchdata_nth_match).

mudge commented 9 years ago

I've pulled your match fix into 0.6.1 and made some other tweaks to get YARD documentation working again: might be worth rebasing with master.

driskell commented 9 years ago

OK I've implemented it to work with multibyte. It's having to create a temporary string object to do so. A future revision could look at trying out rb_str_sublen but my first attempt was unsuccessful.

mudge commented 9 years ago

Thanks for this; I was wondering if there was a way to count the characters purely in C/C++ rather than Ruby but solutions such as http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html assume that the underlying strings are UTF-8 encoded which we can't guarantee so it makes sense to let Ruby do the heavy lifting.

driskell commented 9 years ago

Yes - I planned from the start to let Ruby take care of it, since it is guaranteed to work if it works in Ruby itself. I had hoped, however, that I could do it zero-copy - but it seems rb_str_sublen didn't like something. But functionality first, performance after I think.

If there's anything else you think needs doing before merge, please let me know. When I get some more time later I may re-visit the rb_str_sublen possibility.

mudge commented 9 years ago

Sorry about the delay getting this merged: I'm hoping to take a look at resolving the merge conflicts and rebasing with master this weekend. Did you have any luck with rb_str_sublen?

mudge commented 9 years ago

Closing in favour of #22 which is now rebased against master.