What steps will reproduce the problem?
1. The attached sample program can be compiled and run against any arbitrary
text file for pattern matching. Running the program through valgrind using call
graph will show that SearchNFA is largest consumer of CPU. The program contains
a large regex pattern which is purely for testing.
2. Adjusting the max memory argument does not change the percentage of time
spent in SearchNFA versus SearchDFA.
3. It has been noted that the first call to set_dfa_mem in compile.cc on line
1065 is setting the default max memory. This is regardless of the option
specification value. The max mem value does appear to be used on a later call
to the same function which finds 'm' having a positive value.
Modifying the default value does increase the amount of DFA that is created and
nearly doubles the speed of the attached program. However, there is still some
NFA component to the search.
What is the expected output? What do you see instead?
A python based DFA building tool has been used to identify that the pattern can
be represented entirely in a DFA that takes less than 50 MB. However, no value
can be found that will avoid using an NFA for (a portion of) this pattern
within RE2.
What I would expect is that the maximum memory option would allow for very long
compile times and very large DFA tables in memory, possibly over a hundred
megabytes or more. This never happens.
What version of the product are you using? On what operating system?
Revision: d49d9934b9
Linux: Centos 5.6/OpenSuse 11.4
Please provide any additional information below.
NOTE: If you have a suggested patch, please see
http://code.google.com/p/re2/wiki/Contribute
for information about sending it in for review. Thanks.
Original issue reported on code.google.com by michael....@gmail.com on 13 Dec 2011 at 11:19
Original issue reported on code.google.com by
michael....@gmail.com
on 13 Dec 2011 at 11:19Attachments: