Closed 447313dd-71a0-469e-9885-704141e56655 closed 20 years ago
This patch implements a regular expression feature, which allows
some interesting patterns, in the same way as implemented in perl.
For example, (?(1)yes|no) matches with "yes" if group "1" exists, and
with "no", if it doesn't. Without this feature, the regular expression
must be duplicated to get the same results. In addition to perl's
feature, it will also accept a Python named group as argument.
Here's an example:
(\)?\\w+@\\w+(\\.\\w+)+(?(1)\)
This is a poor email matching regular expression, which will match
with or without the "\<>" symbols.
Logged In: YES user_id=21627
If you add new opcodes, you should also change SRE_MAGIC.
Logged In: YES user_id=7887
That patch is around for a long time. Should I work on it, fixing that problem, and apply it? Do you agree with the feature inclusion?
I remember that the main reason for implementing this is because it is hard to achieve the same results without it. You have to rewrite the whole match twice inside an or'ed group (e.g. "(\<... match email ...>|... match email ...)").
Logged In: YES user_id=21627
I like the patch in principle, but I have a number of additional concerns:
Shouldn't there be a semantic restriction that the back reference is only allowed if it points to a group that is known to precede? I.e. is
(X)|(?(1)Y)
valid? If not, the restriction should be atleast documented, but if possible, it should also be implemented.
Logged In: YES user_id=7887
About the test cases, they're missing indeed. I can write some while applying the patch.
About being experimental, IIRC, it is listed like experimental in the Perl documentation for several years, and will probably stay like this forever. :-) Anyway, IMO this shouldn't affect our evaluation of the importance of that feature for Python's sre.
About semantic restriction, do you mean check if the backreference is lesser than the current group? Should be doable. OTOH, I don't understand your example. In "(X)|(?(1)Y)", there's no sense in using (?(1), as it will always be false.
Logged In: YES user_id=21627
Exactly: My example makes no sense, it will always be false since the reference is to an alternative that cannot be simultaneously be taken. Therefore, I think this should be an error.
Logged In: YES user_id=7887
I see. I'll try to improve the patch with your suggestions as soon as I get some time to work on it. Thanks for your support.
Logged In: YES user_id=7887
Martin, I've checked your concern about making "(X)|(?(1)Y)" an error, and unfortunately the current framework doesn't implement enough state information to catch this. Notice that this is not implemented in very similar cases, like "(X)|\1", which does exactly the same thing as "(X)|(?(1)X)".
I'll be applying that patch as soon as I check it against the current HEAD, and implement some tests (and before it completes its first year of life 8-).
Thanks!
Logged In: YES user_id=21627
Please don't apply the patch before 2.3; this is in beta now, so no new features are allowed (unless you get BDFL permission, of course).
Logged In: YES user_id=7887
Ack!! I'm not going to ask Guido if you belive it's not worth for 2.3.
I'm attaching a new version of the patch, updated to the current HEAD, and including tests.
Thanks for your attention!
Logged In: YES user_id=7887
Comitted with patch bpo-757624.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at =
created_at =
labels = ['library']
title = '(?(id/name)yes|no) re implementation'
updated_at =
user = 'https://bugs.python.org/niemeyer'
```
bugs.python.org fields:
```python
activity =
actor = 'niemeyer'
assignee = 'niemeyer'
closed = True
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'niemeyer'
dependencies = []
files = ['4372', '4373']
hgrepos = []
issue_num = 572936
keywords = ['patch']
message_count = 11.0
messages = ['40400', '40401', '40402', '40403', '40404', '40405', '40406', '40407', '40408', '40409', '40410']
nosy_count = 2.0
nosy_names = ['loewis', 'niemeyer']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue572936'
versions = ['Python 2.3']
```