Open 0ffa5ce0-07b2-47b7-90fe-79403f0626b8 opened 9 years ago
Currently, the regex in TimeRE enforces the numeric ranges. For example: 'm': r"(?P\<m>1[0-2]|0[1-9]|[1-9])",
As a result, an invalid month will cause an generic regex error: ValueError: time data '2015/16/5' does not match format '%Y/%m/%d'
However, if we relax the regex to not check the range and allow datetime to handle it: 'm': r"(?P\<m>\d{1,2})"
The error will be handle in datetime instead and the error will be much more helpful: ValueError: month must be in 1..12
Please consider relaxing the regex for numeric ranges in _strptime.TimeRE.
Do realize that the strptime code is shared with time.strptime() and so this change would have to make sense in both contexts.
This change can also only happen in Python 3.6 because it is backwards-incompatible due to people potentially already catching the previous exception.
And it does not make sense for time, since time.strptime does no additional validation, it just returns the result returned by _strptime.
I'm not sure what format I'm supposed to provide the test in. I attached a file that has the diff of the changes I made, and how the error message is changed (and improved!) in both datetime and time.
A unit test in test_strptime.
I also commented on python/issues-test-cpython#26215 ( https://github.com/python/cpython/pull/26215 ), but for posterity, I'll note a few things:
It seems that (and this may have changed since 2015), _strptime._strptime
now has a stage that (unconditionally?) constructs a temporary datetime_date
, which means it does do this particular validation in both time.strptime
and datetime.strptime
. That said, both flavors of strptime
are way slower than I'd like them to be, and constructing an unnecessary date
/datetime
is a pretty good way to slow down your function, so if we ever go around optimizing this function, that may be one of the first bits on the chopping block.
The logic for strptime
is very complicated and it's very hard to test the full input space of the function (particularly since we're not using property tests (yet)...). This makes me somewhat uneasy about moving the validation stage from the beginning of the function (in parsing the regular expression) to the very end of the function (in the datetime constructor).
It's probably safe to do so, but it may also be worth exploring the possibility of validating this directly in _strptime
(possibly immediately after the string is parsed by the regex), and raising a cleaner error message on failure.
Probably not worth spending a ton of time on that compared to improving the testing around this so that we can feel confident making changes under the hood. .strptime
is really quite slow, and I wouldn't be surprised if we pulled out its guts and replaced most of the regex stuff with a fast C parser at some point in the future. Having good tests will both give us confidence to make this change (and that making this change won't lead to regressions in the future) and help with any future project to replace _strptime._strptime
with a faster version, so I'd say that's the most important thing to do here.
Can we close the ticket? It doesn't sound like we're going to go with anything like this approach.
If there's a good place to record the general ambitions (better strptime tests, more clear strptime errors), that would be good, but I don't think we use tickets for that.
Thinking about this a little more...
I suggest that slowing down execution shouldn't be a worry in this case, because trying to parse a garbage string into a date shouldn't be something code is doing zillions of times, and if it is, being slow isn't a terrible thing. It should be the exception (so to speak) and not the rule, and thus not something we should worry about optimizing for.
That said, I can see not wanting to rely on throwing low-level errors that are outside Python's direct control. (In fact, the PR I wrote didn't account for the OS-dependent nature of those messages, so if we decide to go ahead after all I'll need to make my PR account for that.) If that is enough reason to reject the change, I'm content with that.
One way or the other, I would love to get this out of "Open" status - if we want to just close it as-is, I'll be satisfied.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', 'library']
title = '_strptime.TimeRE should not enforce range in regex'
updated_at =
user = 'https://bugs.python.org/SteveYeung'
```
bugs.python.org fields:
```python
activity =
actor = 'catherinedevlin'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'Steve Yeung'
dependencies = []
files = ['40319']
hgrepos = []
issue_num = 24929
keywords = ['patch']
message_count = 8.0
messages = ['249074', '249133', '249141', '249518', '249541', '393904', '393948', '394146']
nosy_count = 5.0
nosy_names = ['catherinedevlin', 'r.david.murray', 'Catherine.Devlin', 'Steve Yeung', 'p-ganssle']
pr_nums = ['26215']
priority = 'low'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue24929'
versions = ['Python 3.6']
```