python / cpython

The Python programming language
https://www.python.org/
Other
60.48k stars 29.24k forks source link

(?(id/name)yes|no) re implementation #36790

Closed 447313dd-71a0-469e-9885-704141e56655 closed 20 years ago

447313dd-71a0-469e-9885-704141e56655 commented 21 years ago
BPO 572936
Nosy @loewis
Files
  • python-2.3a0-grouprefexists.patch
  • python-2.3b1-grouprefexists.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['library'] title = '(?(id/name)yes|no) re implementation' updated_at = user = 'https://bugs.python.org/niemeyer' ``` bugs.python.org fields: ```python activity = actor = 'niemeyer' assignee = 'niemeyer' closed = True closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'niemeyer' dependencies = [] files = ['4372', '4373'] hgrepos = [] issue_num = 572936 keywords = ['patch'] message_count = 11.0 messages = ['40400', '40401', '40402', '40403', '40404', '40405', '40406', '40407', '40408', '40409', '40410'] nosy_count = 2.0 nosy_names = ['loewis', 'niemeyer'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue572936' versions = ['Python 2.3'] ```

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    This patch implements a regular expression feature, which allows
    some interesting patterns, in the same way as implemented in perl.
    For example, (?(1)yes|no) matches with "yes" if group "1" exists, and
    with "no", if it doesn't. Without this feature, the regular expression
    must be duplicated to get the same results. In addition to perl's feature, it will also accept a Python named group as argument.

    Here's an example:

    (\)?\\w+@\\w+(\\.\\w+)+(?(1)\)

    This is a poor email matching regular expression, which will match
    with or without the "\<>" symbols.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 21 years ago

    Logged In: YES user_id=21627

    If you add new opcodes, you should also change SRE_MAGIC.

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    Logged In: YES user_id=7887

    That patch is around for a long time. Should I work on it, fixing that problem, and apply it? Do you agree with the feature inclusion?

    I remember that the main reason for implementing this is because it is hard to achieve the same results without it. You have to rewrite the whole match twice inside an or'ed group (e.g. "(\<... match email ...>|... match email ...)").

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 21 years ago

    Logged In: YES user_id=21627

    I like the patch in principle, but I have a number of additional concerns:

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    Logged In: YES user_id=7887

    About the test cases, they're missing indeed. I can write some while applying the patch.

    About being experimental, IIRC, it is listed like experimental in the Perl documentation for several years, and will probably stay like this forever. :-) Anyway, IMO this shouldn't affect our evaluation of the importance of that feature for Python's sre.

    About semantic restriction, do you mean check if the backreference is lesser than the current group? Should be doable. OTOH, I don't understand your example. In "(X)|(?(1)Y)", there's no sense in using (?(1), as it will always be false.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 21 years ago

    Logged In: YES user_id=21627

    Exactly: My example makes no sense, it will always be false since the reference is to an alternative that cannot be simultaneously be taken. Therefore, I think this should be an error.

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    Logged In: YES user_id=7887

    I see. I'll try to improve the patch with your suggestions as soon as I get some time to work on it. Thanks for your support.

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    Logged In: YES user_id=7887

    Martin, I've checked your concern about making "(X)|(?(1)Y)" an error, and unfortunately the current framework doesn't implement enough state information to catch this. Notice that this is not implemented in very similar cases, like "(X)|\1", which does exactly the same thing as "(X)|(?(1)X)".

    I'll be applying that patch as soon as I check it against the current HEAD, and implement some tests (and before it completes its first year of life 8-).

    Thanks!

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 21 years ago

    Logged In: YES user_id=21627

    Please don't apply the patch before 2.3; this is in beta now, so no new features are allowed (unless you get BDFL permission, of course).

    447313dd-71a0-469e-9885-704141e56655 commented 21 years ago

    Logged In: YES user_id=7887

    Ack!! I'm not going to ask Guido if you belive it's not worth for 2.3.

    I'm attaching a new version of the patch, updated to the current HEAD, and including tests.

    Thanks for your attention!

    447313dd-71a0-469e-9885-704141e56655 commented 20 years ago

    Logged In: YES user_id=7887

    Comitted with patch bpo-757624.