lamuguo / re2

Automatically exported from code.google.com/p/re2
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Re2 will not build large DFA tables regardless of max memory setting. #55

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.  The attached sample program can be compiled and run against any arbitrary 
text file for pattern matching. Running the program through valgrind using call 
graph will show that SearchNFA is largest consumer of CPU. The program contains 
a large regex pattern which is purely for testing.

2. Adjusting the max memory argument does not change the percentage of time 
spent in SearchNFA versus SearchDFA.

3. It has been noted that the first call to set_dfa_mem in compile.cc on line 
1065 is setting the default max memory. This is regardless of the option 
specification value. The max mem value does appear to be used on a later call 
to the same function which finds 'm' having a positive value. 

Modifying the default value does increase the amount of DFA that is created and 
nearly doubles the speed of the attached program. However, there is still some 
NFA component to the search.

What is the expected output? What do you see instead?

A python based DFA building tool has been used to identify that the pattern can 
be represented entirely in a DFA that takes less than 50 MB. However, no value 
can be found that will avoid using an NFA for (a portion of) this pattern 
within RE2. 

What I would expect is that the maximum memory option would allow for very long 
compile times and very large DFA tables in memory, possibly over a hundred 
megabytes or more. This never happens.

What version of the product are you using? On what operating system?

Revision: d49d9934b9
Linux: Centos 5.6/OpenSuse 11.4

Please provide any additional information below.
NOTE: If you have a suggested patch, please see
http://code.google.com/p/re2/wiki/Contribute
for information about sending it in for review.  Thanks.

Original issue reported on code.google.com by michael....@gmail.com on 13 Dec 2011 at 11:19

Attachments:

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 7f1c5387aac5.

Original comment by rsc@swtch.com on 10 Jan 2014 at 3:17