Open GoogleCodeExporter opened 9 years ago
Concatenating the .pattern attribute of compiled patterns doesn't "just work",
because the regexes may have been provided with different 'flags' arguments,
even assuming that they don't have conflicting named groups.
I'll need to consider the alternatives (concatenating compiled pattern objects
with "+"?) and implications of this feature.
Original comment by re...@mrabarnett.plus.com
on 30 Jun 2011 at 2:03
I'm not sure about "concatenation with +". I was really just suggesting that
the first parameter to "compile" could be either a pattern string, or a
sequence of pattern strings and/or compiled patterns. As you say, flags would
have to match, named groups would have to satisfy the standard constraint of
being unique within selector branches, etc.
Original comment by Bill.Jan...@gmail.com
on 7 Jul 2011 at 7:28
Here's an illustration of the kind of thing I'm doing now.
Original comment by Bill.Jan...@gmail.com
on 7 Jul 2011 at 7:33
Attachments:
One thought I had was that, in some ways, it's like the named list feature,
except that it's a subpattern, which suggests:
which_regex = regex.compile("first|second")
which_item_regex = regex.compile(r"\L<which>\s+(\w+)", which=which_regex)
Original comment by re...@mrabarnett.plus.com
on 7 Jul 2011 at 8:00
Ah, interesting idea.
Though I suspect a different flag character (instead of 'L') might be a good
idea. It's not really a list. Is 'R' taken?
Hmmm... One issue I see is that it introduces another level of naming (the
name of the group) which kind of detracts from the "direct manipulation" aspect
of being able to use the Python variable directly:
which_regex = regex.compile("first|second")
which_item_regex = regex.compile((which_regex, r"\s+(\w+)"))
On the other hand, it's more regex-syntax-friendly.
Original comment by Bill.Jan...@gmail.com
on 8 Jul 2011 at 6:43
One feature which is currently missing is the attribute "named_lists":
>>> which = regex.compile(r"\L<options>", options="first second".split())
>>> which.pattern
'\\L<options>'
>>> which.named_lists
{'options': frozenset({'second', 'first'})}
You can then say:
>>> which_item_regex = regex.compile(which.pattern + r"\s+(\w+)",
**which.named_lists)
>>> which_item_regex.pattern
'\\L<options>\\s+(\\w+)'
>>> which_item_regex.named_lists
{'options': frozenset({'second', 'first'})}
That will be in the next release.
Original comment by re...@mrabarnett.plus.com
on 8 Jul 2011 at 7:20
Re \R, some regex implementations use that to match various line endings,
something like r"\r\n|\n" (possibly Unicode newline as well), and I don't want
to preclude that in the future.
Original comment by re...@mrabarnett.plus.com
on 8 Jul 2011 at 8:13
Inserting a pre-compiled regex into a regex pattern is not the same as
inserting a regex pattern into a regex pattern.
For example, this:
p = regex.compile("cat")
q = regex.compile("(?i)" + p.pattern)
is the same as this:
q = regex.compile("(?i)cat")
which will match "CAT", but this:
p = regex.compile("cat")
q = regex.compile(r"(?i)\I<rgx>", rgx=p)
won't match "CAT", because p has already been compiled as case-insensitive.
Similar remarks apply to DOTALL and the other flags, and also fuzzy matching.
This:
p = regex.compile("cat")
q = regex.compile("(?:" + p.pattern + "){e<=1}")
is the same as this:
q = regex.compile("(?:cat){e<=1}")
and is a fuzzy regex, but this:
p = regex.compile("cat")
q = regex.compile(r"\I<rgx>{e<=1}", rgx=p)
isn't. (It should probably raise an exception.)
So the question is: should inserting a pre-existing regex actually use the
pre-compiled regex as-is as shown above, or should it use that regex's pattern
with an implicit (?:...) around it?
If it uses the pre-compiled regex as-is, should that regex be atomic (no
backtracking into it after it has matched)?
Should there be both forms of insertion, r"\I<rgx>" and r"\i<rgx>"? (That may
be confusing!)
Original comment by re...@mrabarnett.plus.com
on 11 Jul 2011 at 11:24
Original comment by re...@mrabarnett.plus.com
on 6 Aug 2011 at 3:29
Original issue reported on code.google.com by
Bill.Jan...@gmail.com
on 29 Jun 2011 at 4:24