python / cpython

The Python programming language
https://www.python.org
Other
63.18k stars 30.25k forks source link

better name for re.error Exception class. #83162

Closed eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc closed 10 months ago

eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago
BPO 38981
Nosy @gpshead, @ezio-melotti, @serhiy-storchaka, @Carreau, @sourabh025
PRs
  • python/cpython#17501
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['expert-regex', 'easy', 'type-feature', '3.9'] title = 'better name for re.error Exception class.' updated_at = user = 'https://github.com/Carreau' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'none' closed = False closed_date = None closer = None components = ['Regular Expressions'] creation = creator = 'mbussonn' dependencies = [] files = [] hgrepos = [] issue_num = 38981 keywords = ['patch', 'easy'] message_count = 11.0 messages = ['357867', '357878', '357882', '357883', '357884', '357886', '357962', '357982', '357986', '358059', '358124'] nosy_count = 6.0 nosy_names = ['gregory.p.smith', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'mbussonn', 'sourabh025'] pr_nums = ['17501'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue38981' versions = ['Python 3.9'] ```

    Linked PRs

    eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago

    better error/exception name for re.compile error.

    Currently the error raise by re.compile when it fails to compile is error defined in sre_constants.py:

    class error(Exception):
        """Exception raised for invalid regular expressions.
    

    This is quite disturbing as most exception start with an uppercase and have a tiny bit more descriptive name.

    Would it be possible to have it renamed as something more explicit like ReCompileError, and still keeping the potential error alias as deprecated ?

    serhiy-storchaka commented 4 years ago

    It is common practice that the module specific exception is called just "error". There is nothing wrong with this.

    I do not see a need to introduce a different alias.

    serhiy-storchaka commented 4 years ago

    See also the discussion about renaming json.loads(): https://mail.python.org/archives/list/python-ideas@python.org/thread/EJTIVQ2ZFSVHALTLRGFCOMOYGZYMKGQU/

    eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago

    Most of the module specific classes are Error, not error, at least with an uppercase E you know it's a class.

    if a novice sees :

    error: missing ), unterminated subpattern at position 0

    It will be relatively tough or them to figure out that error is the type of the exception.

    Also it's not because something works that you can't improve it ...

    serhiy-storchaka commented 4 years ago

    Since it affects more than one module I suggest to discuss the idea about renaming exceptions of the Python-Ideas maillist first. Until different decision be made I am closing. Personally I think this is a duplicate of just discussed and rejected idea.

    eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago

    Thanks for the advice I've done that !

    Have a good day.

    serhiy-storchaka commented 4 years ago

    Reopened after discussing on Python-ideas: https://mail.python.org/archives/list/python-ideas@python.org/thread/64NHNY6RD4HQWBSBV6J7XIN7UAHNTQBR/.

    eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago

    Thanks Serhiy,

    Here is a rough idea of how many places would be touched by renaming in the re module:

    https://github.com/Carreau/cpython/commit/59e4c5150c842f849ff3a9ba8a94df1df7a5eb1c (50 additions and 42 deletions.).

    I haven't found any places that need changes in the C code, and need to polish documentation, rebuild and run the test suite before sending a PR.

    gpshead commented 4 years ago

    Strictly speaking not all of those _need_ to be touched given the old name is always going to exist for backwards compatibility. But I agree that we should update them as part of this regardless.

    I'd go forward with a PR.

    The only fallout I expect a change like this to have on users is in the very odd test scenario where someone has hardcoded the error name in a string. That is rare, especially for an re.error which is generally not an expected exception.

    serhiy-storchaka commented 4 years ago

    I am not sure about the new name. "re" is an abbreviation, so if include it in the exception name it should be "RE". I am not sure what name is better: RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError, SyntaxError or Error.

    json raises JSONDecodeError, ElementTree raises ParseError, other xml modules raise ExpatError, csv raises Error, configparser raises subclasses of Error.

    Many modules (at least 18: aifc, binhex, concurrent.futures, configparser, copy, cvs, ftplib, locale, mailbox, shutil, sqlite, sunau, test.support, uu, wave, webbrowser, xdrlib, xmlrpc.client) have an exception named just Error for module-specific errors.

    eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc commented 4 years ago

    RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError, SyntaxError or Error,

    Many modules [...] have an exception named just Error

    RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError are all fine with me.

    SyntaxError would be super confusing IMHO. Remember the StackTrace does not get the fully qualified name, so it would be hard to distinguish from a SyntaxError with the Python Syntax.

    aifc, binhex, sunau, uu, xdrlib might be removed with PEP-594, And I find Error not informative enough. It suffers from the same issues as above, as stack traces do not have the full qualified name.

    I would also add that being able to search and find all occurrences of a given exceptions is useful, and that Error is too generic.

    Let me know your choice and I can rename.

    hugovk commented 2 years ago

    aifc, binhex, sunau, uu, xdrlib might be removed with PEP-594

    Yes, aifc, sunau, uu and xdrlib made the list and will be deprecated in 3.11 and removed in 3.13.

    See PEP 594 – Removing dead batteries from the standard library and https://github.com/python/cpython/issues/91217.

    binhex isn't part of the PEP 594 because it was already deprecated in 3.9 and has been removed from 3.11:

    https://docs.python.org/3.11/whatsnew/3.11.html?highlight=binhex#removed


    Edit: pinging the nosy list after the bpo migration:

    @gpshead, @ezio-melotti, @serhiy-storchaka, @Carreau, @Sourabh025

    achhina commented 1 year ago

    @Carreau I see that you've closed your PR; would you still like to work on this? It seems that your PR was mostly done, and just required finalizing the actual exception name.

    Carreau commented 1 year ago

    would you still like to work on this

    Feel free to take over, of have someone else do it it seem pretty straitforward. I'm though not going to debate the name of the exception.

    achhina commented 1 year ago

    Okay thanks, I can take a stab at it, and see if there can be some consensus regarding the name.

    achhina commented 1 year ago

    I've brought the above patch in line with the current main branch as a draft PR - thanks again @Carreau. However, before I make any changes I'd like to see if we can agree on the new exception name.

    Looking at other popular libraries across several languages, there are several variations such as Java's PatternSyntaxException, but the ones that appear the most often seem to be Error, SyntaxError, and CompileError. I believe these without prefixes would not be much clearer than the original exception, so they should be prefixed with uppercase RE as it's an abbreviation.

    This leaves us with REError, RESyntaxError, and RECompileError. Admittedly subjectively, I prefer RESyntaxError and RECompileError over REError, as they seem to read better and as such provide more clarity.

    I imagine users will most likely see this exception due to syntax issues with the expression, and some libraries make the distinction that a compilation error is related to an invalid encoding vs. a syntax error. So with that in mind, plus @barneygale's point that it could be read as recompile error, I'd like to propose using RESyntaxError.

    cc @serhiy-storchaka, @gpshead as you've participated in these discussions before as well.

    gpshead commented 1 year ago

    I'm not feeling like there is a lot of value in doing this anymore.

    If I were choosing a new name my logic would look like:

    So I expect many will continue to type re.error regardless of the new name... My gut feeling is just to close this issue as "not planned". It's a historical oddity of the language from before a consistent style was well realized.

    achhina commented 1 year ago

    I think for me when looking at this issue, the benefit of introducing this would be the additional clarity for beginners, as an error like this

    RESyntaxError: nothing to repeat at position 0

    would be more clear than what we have currently

    error: nothing to repeat at position 0

    I expect code to always access the exception from the module itself

    Currently that would be correct as from re import error would seem like a poor choice, however I've quite often seen from json import JSONDecodeError, and I'd imagine similarly people can make that decision with re.

    which is objectively ugly and unpleasing to type

    I can understand that, but as you mentioned for those folks re.error will always be available, so this should have a very low impact for them while providing a more informative exception name.

    barneygale commented 1 year ago

    We have re.Pattern, why not re.PatternError? :)

    achhina commented 1 year ago

    We have re.Pattern, why not re.PatternError? :)

    I think PatternError would work well as well :). Both SyntaxError/PatternError would be good proxies to imply that there is an issue with the regular expression syntax (the most likely reason to see an error). IIRC when I was looking at other regex libraries in/outside of Python, some form of SyntaxError seemed to be more popular. So, that combined with if we're suggesting there's an issue with the regular expression syntax, I leaned towards SyntaxError.

    What I do like about PatternError is the possibility it adds to not have a prefix of RE, which makes it more pleasant to type and read. Although I'm leaning towards keeping the prefix, as I find it adds more clarity and seems to be in line with the popular exception format of <library><error_type>.

    So what I'm thinking is this:

    1. If we want to keep the prefix RE, then I'd suggest going with RESyntaxError.
    2. Otherwise, PatternError.

    Thoughts?

    terryjreedy commented 1 year ago

    I prefer PatternError over RESyntaxError over RECompileError

    terryjreedy commented 10 months ago

    I merged the near-minimal PR with re.error 'updated' only in idlelib. I will do adjusted backports of the idlelib changes to keep the main bodies of its files in sync and then close.