Closed eadbd4d3-cbb4-42b8-8420-9f80dcde2bcc closed 10 months ago
better error/exception name for re.compile error.
Currently the error raise by re.compile when it fails to compile is error
defined in sre_constants.py:
class error(Exception):
"""Exception raised for invalid regular expressions.
This is quite disturbing as most exception start with an uppercase and have a tiny bit more descriptive name.
Would it be possible to have it renamed as something more explicit like ReCompileError
, and still keeping the potential error
alias as deprecated ?
It is common practice that the module specific exception is called just "error". There is nothing wrong with this.
I do not see a need to introduce a different alias.
See also the discussion about renaming json.loads(): https://mail.python.org/archives/list/python-ideas@python.org/thread/EJTIVQ2ZFSVHALTLRGFCOMOYGZYMKGQU/
Most of the module specific classes are Error
, not error
, at least with an uppercase E you know it's a class.
if a novice sees :
error: missing ), unterminated subpattern at position 0
It will be relatively tough or them to figure out that error
is the type of the exception.
Also it's not because something works that you can't improve it ...
Since it affects more than one module I suggest to discuss the idea about renaming exceptions of the Python-Ideas maillist first. Until different decision be made I am closing. Personally I think this is a duplicate of just discussed and rejected idea.
Thanks for the advice I've done that !
Have a good day.
Reopened after discussing on Python-ideas: https://mail.python.org/archives/list/python-ideas@python.org/thread/64NHNY6RD4HQWBSBV6J7XIN7UAHNTQBR/.
Thanks Serhiy,
Here is a rough idea of how many places would be touched by renaming in the re
module:
https://github.com/Carreau/cpython/commit/59e4c5150c842f849ff3a9ba8a94df1df7a5eb1c (50 additions and 42 deletions.).
I haven't found any places that need changes in the C code, and need to polish documentation, rebuild and run the test suite before sending a PR.
Strictly speaking not all of those _need_ to be touched given the old name is always going to exist for backwards compatibility. But I agree that we should update them as part of this regardless.
I'd go forward with a PR.
The only fallout I expect a change like this to have on users is in the very odd test scenario where someone has hardcoded the error name in a string. That is rare, especially for an re.error which is generally not an expected exception.
I am not sure about the new name. "re" is an abbreviation, so if include it in the exception name it should be "RE". I am not sure what name is better: RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError, SyntaxError or Error.
json raises JSONDecodeError, ElementTree raises ParseError, other xml modules raise ExpatError, csv raises Error, configparser raises subclasses of Error.
Many modules (at least 18: aifc, binhex, concurrent.futures, configparser, copy, cvs, ftplib, locale, mailbox, shutil, sqlite, sunau, test.support, uu, wave, webbrowser, xdrlib, xmlrpc.client) have an exception named just Error for module-specific errors.
RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError, SyntaxError or Error,
Many modules [...] have an exception named just Error
RECompileError, REParseError, RESyntaxError, REError, CompileError, ParseError are all fine with me.
SyntaxError would be super confusing IMHO. Remember the StackTrace does not get the fully qualified name, so it would be hard to distinguish from a SyntaxError with the Python Syntax.
aifc, binhex, sunau, uu, xdrlib might be removed with PEP-594, And I find Error
not informative enough. It suffers from the same issues as above, as stack traces do not have the full qualified name.
I would also add that being able to search and find all occurrences of a given exceptions is useful, and that Error is too generic.
Let me know your choice and I can rename.
aifc, binhex, sunau, uu, xdrlib might be removed with PEP-594
Yes, aifc, sunau, uu and xdrlib made the list and will be deprecated in 3.11 and removed in 3.13.
See PEP 594 – Removing dead batteries from the standard library and https://github.com/python/cpython/issues/91217.
binhex isn't part of the PEP 594 because it was already deprecated in 3.9 and has been removed from 3.11:
https://docs.python.org/3.11/whatsnew/3.11.html?highlight=binhex#removed
Edit: pinging the nosy list after the bpo migration:
@gpshead, @ezio-melotti, @serhiy-storchaka, @Carreau, @Sourabh025
@Carreau I see that you've closed your PR; would you still like to work on this? It seems that your PR was mostly done, and just required finalizing the actual exception name.
would you still like to work on this
Feel free to take over, of have someone else do it it seem pretty straitforward. I'm though not going to debate the name of the exception.
Okay thanks, I can take a stab at it, and see if there can be some consensus regarding the name.
I've brought the above patch in line with the current main branch as a draft PR - thanks again @Carreau. However, before I make any changes I'd like to see if we can agree on the new exception name.
Looking at other popular libraries across several languages, there are several variations such as Java's PatternSyntaxException
, but the ones that appear the most often seem to be Error
, SyntaxError
, and CompileError
. I believe these without prefixes would not be much clearer than the original exception, so they should be prefixed with uppercase RE
as it's an abbreviation.
This leaves us with REError
, RESyntaxError
, and RECompileError
. Admittedly subjectively, I prefer RESyntaxError
and RECompileError
over REError
, as they seem to read better and as such provide more clarity.
I imagine users will most likely see this exception due to syntax issues with the expression, and some libraries make the distinction that a compilation error is related to an invalid encoding vs. a syntax error. So with that in mind, plus @barneygale's point that it could be read as recompile error, I'd like to propose using RESyntaxError
.
cc @serhiy-storchaka, @gpshead as you've participated in these discussions before as well.
I'm not feeling like there is a lot of value in doing this anymore.
If I were choosing a new name my logic would look like:
re.Error
to gain the uppercase spelling consistent with exception classes elsewhere.from re import *
and does so later in life... so Error would then import and overwrite their own Error class. Plausibly breaking existing code. So loop back and put the prefix on it again.re.RESomeError
spelling again, which is objectively ugly and unpleasing to type compared to re.error
or re.Error
. infinite decision loop.So I expect many will continue to type re.error
regardless of the new name... My gut feeling is just to close this issue as "not planned". It's a historical oddity of the language from before a consistent style was well realized.
I think for me when looking at this issue, the benefit of introducing this would be the additional clarity for beginners, as an error like this
RESyntaxError: nothing to repeat at position 0
would be more clear than what we have currently
error: nothing to repeat at position 0
I expect code to always access the exception from the module itself
Currently that would be correct as from re import error
would seem like a poor choice, however I've quite often seen from json import JSONDecodeError
, and I'd imagine similarly people can make that decision with re
.
which is objectively ugly and unpleasing to type
I can understand that, but as you mentioned for those folks re.error
will always be available, so this should have a very low impact for them while providing a more informative exception name.
We have re.Pattern
, why not re.PatternError
? :)
We have
re.Pattern
, why notre.PatternError
? :)
I think PatternError
would work well as well :). Both SyntaxError
/PatternError
would be good proxies to imply that there is an issue with the regular expression syntax (the most likely reason to see an error). IIRC when I was looking at other regex libraries in/outside of Python, some form of SyntaxError
seemed to be more popular. So, that combined with if we're suggesting there's an issue with the regular expression syntax, I leaned towards SyntaxError
.
What I do like about PatternError
is the possibility it adds to not have a prefix of RE
, which makes it more pleasant to type and read. Although I'm leaning towards keeping the prefix, as I find it adds more clarity and seems to be in line with the popular exception format of <library><error_type>
.
So what I'm thinking is this:
RE
, then I'd suggest going with RESyntaxError
.PatternError
.Thoughts?
I prefer PatternError over RESyntaxError over RECompileError
I merged the near-minimal PR with re.error 'updated' only in idlelib. I will do adjusted backports of the idlelib changes to keep the main bodies of its files in sync and then close.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['expert-regex', 'easy', 'type-feature', '3.9']
title = 'better name for re.error Exception class.'
updated_at =
user = 'https://github.com/Carreau'
```
bugs.python.org fields:
```python
activity =
actor = 'serhiy.storchaka'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Regular Expressions']
creation =
creator = 'mbussonn'
dependencies = []
files = []
hgrepos = []
issue_num = 38981
keywords = ['patch', 'easy']
message_count = 11.0
messages = ['357867', '357878', '357882', '357883', '357884', '357886', '357962', '357982', '357986', '358059', '358124']
nosy_count = 6.0
nosy_names = ['gregory.p.smith', 'ezio.melotti', 'mrabarnett', 'serhiy.storchaka', 'mbussonn', 'sourabh025']
pr_nums = ['17501']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue38981'
versions = ['Python 3.9']
```
Linked PRs