google / re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
BSD 3-Clause "New" or "Revised" License
8.91k stars 1.13k forks source link

SIGSEGV if `Match` is called before `Compile` #484

Closed masklinn closed 6 months ago

masklinn commented 6 months ago
>>> f = Filter()
>>> f.Compile()
re2/filtered_re2.cc:74: Compile called before Add.
>>> f.Match("")
>>> f = Filter()
>>> f.Match("")
py311: exit -11

This is inside tox, using python installed with pyenv, but reproduces with both Python 3.8 and 3.11.

I would assume it's because while there is a guard to check that the filter is compiled in FirstMatch https://github.com/google/re2/blob/108914d28a79243d4300e7e651cd0a0d5883ca0f/re2/filtered_re2.cc#L96-L99 there is no such guard for AllMatches: https://github.com/google/re2/blob/108914d28a79243d4300e7e651cd0a0d5883ca0f/re2/filtered_re2.cc#L108-L118

I assume the PrefilterTree would be the issue but apparently it guards against this https://github.com/google/re2/blob/108914d28a79243d4300e7e651cd0a0d5883ca0f/re2/prefilter_tree.cc#L271-L276 so I'm not sure why it SIGSEGVs, but I have not debugged the C++ code just looked at it.

Interestingly there is a guard in Set::Match: https://github.com/google/re2/blob/108914d28a79243d4300e7e651cd0a0d5883ca0f/re2/set.cc#L128-L133 but Filter::Match does not check it: https://github.com/google/re2/blob/108914d28a79243d4300e7e651cd0a0d5883ca0f/python/_re2.cc#L226 so it does not protect the FilteredRE2 (assuming that's what crashes, for all I know it might be something else entirely).

junyer commented 6 months ago

Thanks for the report! This crash occurs here: set_ will be null until Compile() has been called. I think raising ValueError would be reasonable.

junyer commented 6 months ago

Having meditated on it a little more, raising re2.error would be preferable – and arguably trivial as per this documentation.

masklinn commented 6 months ago

Thanks for the report! This crash occurs here: set_ will be null until Compile() has been called.

Oh yeah that's obvious now that you point it out, I had not noticed the Set is only instantiated during compilation so I assumed that bit was fine and went on to look in the prefilter, sorry about that.

junyer commented 6 months ago

FYI, I just cut release 2024-04-01 (and published wheels) containing the fixes for this issue and also for issue https://github.com/google/re2/issues/485. :)

masklinn commented 6 months ago

FYI, I just cut release 2024-04-01 (and published wheels) containing the fixes for this issue and also for issue #485. :)

Awesome, thanks.