miguel01997 / cookcc

Automatically exported from code.google.com/p/cookcc
Other
0 stars 1 forks source link

RegEx sub-expression support #22

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
After some studying, I think that it is possible to add simple support of 
sub-expressions.

Instead of ( ) pair in the typical RegEx to setup the sub-expression group, (/ 
/) pair will be used.

Following restrictions are required:
1. Nested sub-expression will not be allowed.
2. No ?|+ RegEx operators are supported on sub-expression
3. The RegEx must guarantee the match will go through the sub-expression.

For example:

The following sub expressions are supported
(/a*/)(/a*/)(/(abc)|(a(b+)c)/)a

The following kind of sub expressions are not supported and an error will be 
thrown.
(/a*/)+
(/a*/)|(/b*/)|c
(ab(/c*/))|(abc*)

Original issue reported on code.google.com by superdup...@gmail.com on 6 Jan 2012 at 6:29

GoogleCodeExporter commented 9 years ago
It should be noted that this feature will allow variable length trailing 
contexts like the following

(a*)b(a*)/(a+)b

Note that the both part of the RegEx are variable length.  Further, they have 
overlapping regions.

The way flex does the match is by re-run the first part of RegEx again, but the 
result is incorrect because part of a should belong to the trailing context.  
flex would detect issue and gives a warning at compile time.

It should be noted that the algorithm used to deal with this issue will take up 
to O(k*n) space, where k is the number of sub-expressions, and n is the length 
of the match.

Original comment by superdup...@gmail.com on 6 Jan 2012 at 7:11