Open pganssle opened 3 days ago
The year for datetime.datetime must be and is allowed to be anything in range MINYEAR <= year <= MAXYEAR
, which is 1 <= year <= 9999. I expect that the format functions should handle any legal date.
Considering these results:
>>> datetime(999, 1, 1).strftime("%c")
'Tue Jan 1 00:00:00 999'
>>> datetime.strptime("Tue Jan 1 00:00:00 999", "%c") # as from strftime() above => the error described above
[snip]
ValueError: time data 'Tue Jan 1 00:00:00 999' does not match format '%c'
>>> datetime.strptime("Tue Jan 1 00:00:00 999", "%c") # adding 0 before 999 to have 4-digit width year => success
datetime.datetime(999, 1, 1, 0, 0)
...and the following fragment of the docs (https://docs.python.org/3/library/datetime.html#technical-detail):
- The strptime() method can parse years in the full [1, 9999] range, but years < 1000 must be zero-filled to 4-digit width.
...I am not sure if the proviso that years < 1000 must be zero-filled to 4-digit width intentionally covers also this case.
One could argue that it does, and there is nothing to fix here.
Another person, however, could argue that:
What do you think?
[EDIT] The quoted note refers to the %Y
format code, not to the %c
one. So I believe that that imaginary Another person would be right. :)
PS It seems that time.{strftime,strptime}()
behave the same (as, apparently, it uses the same implementation from _strptime
):
$ ./python
Python 3.14.0a0 (heads/main:a4d1fdfb15, Sep 26 2024, 22:47:21) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import time
>>> t_tuple = time.strptime("Tue Jan 1 00:00:00 0999", '%c')
>>> t_tuple
time.struct_time(tm_year=999, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=-1)
>>> time.strftime('%c', t_tuple)
'Tue Jan 1 00:00:00 999'
>>> time.strptime(_, '%c')
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
time.strptime(_, '%c')
~~~~~~~~~~~~~^^^^^^^^^
File "/home/zuo/cpython/Lib/_strptime.py", line 567, in _strptime_time
tt = _strptime(data_string, format)[0]
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
File "/home/zuo/cpython/Lib/_strptime.py", line 352, in _strptime
raise ValueError("time data %r does not match format %r" %
(data_string, format))
ValueError: time data 'Tue Jan 1 00:00:00 999' does not match format '%c'
It seems that the source of the problem is that (at least typically – for the C.UTF-8
locale and at least some others, e.g. pl_PL.UTF-8
; yet, it seems that also for any other locales...):
datetime.datetime.strftime()
– when %c
is used to format a date+time – does not use the datetime
's way of formatting %Y
(which would result in a 4-digit year, with leading zeros for year < 1000), but returns a string that contains the year number with minimum count of digits needed to represent that number (i.e., less than 4 for year < 1000)....whereas...
datetime.datetime.strptime()
– when %c
is used to parse a date+time – uses, to parse the year fragment, an %Y
-specific regex (see the _strptime
module...) which requires that the year number has exactly 4 digits.I checked that:
(1) When formatting that example year 999, the results are:
Function/Method | For "%c" |
For "%Y" |
---|---|---|
time.strftime() |
"999" |
"999" |
datetime.datetime.strftime() |
"999" |
"0999" [sic!] |
Conclusion: datetime.datetime.strftime()
's %c
formatting behaves like time.strftime()
, therefore it is not based on datetime.datetime.strftime()
's formatting of %Y
.
(2) When parsing that example year 999
(as well as, e.g., 9
) – both as a part of full date (%c
) and alone (%Y
) – only the 4-digit year format is accepted. Smaller numbers of digits always cause the same ValueError
from _strptime
(whose machinery, as noted above, even for %c
uses the %Y
-specific stuff...).
In the _strptime
module's machinery (which is used by datetime.datetime.strptime()
and time.strptime()
): decouple the %c
's parsing regex from the %Y
's one, making the former more liberal (accepting also 1, 2 or 3 digits in the year number).
[The fix implementation would be made in the _strptime
module, probably somewhere in in LocaleTime.__calc_date_time()
/TimeRE.__init__()
...TimeRE
's __init__()
and pattern()
]
(Another theoretically possible variant: just make the %Y
's regex more liberal – however that seems too disruptive...)
@pganssle @terryjreedy
I'd happy to implement the fix – if you decide that this should be fixed.
No issue on my Macbook laptop
Python 3.14.0a0 (heads/main:162d152146a, Sep 25 2024, 10:45:28) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime.strptime(datetime(1000, 1, 1).strftime("%c"), "%c")
datetime.datetime(1000, 1, 1, 0, 0)
>>> datetime.strptime(datetime(999, 1, 1).strftime("%c"), "%c")
datetime.datetime(999, 1, 1, 0, 0)
>>>
@Mariatta
Could you please check what string is returned on you system from the following call?
>>> datetime(999, 1, 1).strftime("%c")
Thanx :)
PS My guess is that, for your locale, a %c
-formatted date+time includes a 2-digit year variant (instead of the 4-digit one).
@zuo I just tried it just now
Python 3.14.0a0 (heads/main:162d152146a, Sep 25 2024, 10:45:28) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> datetime(999, 1, 1).strftime("%c")
'Tue Jan 1 00:00:00 0999'
@Mariatta
Thank you!
Yeah, that leading zero your platform/locale provides makes strftime
's %c
format digestible by strptime
on your system. Apparently, that's not the case for Linux family. :-/
Anyway, now it's quite clear for me what the fix should be.
Proof of concept:
diff --git a/Lib/_strptime.py b/Lib/_strptime.py
index a3f8bb544d..6a2527b75c 100644
--- a/Lib/_strptime.py
+++ b/Lib/_strptime.py
@@ -213,8 +213,10 @@ def __init__(self, locale_time=None):
'Z'),
'%': '%'})
base.__setitem__('W', base.__getitem__('U').replace('U', 'W'))
- base.__setitem__('c', self.pattern(self.locale_time.LC_date_time))
- base.__setitem__('x', self.pattern(self.locale_time.LC_date))
+ base.__setitem__(
+ 'c', self.__pattern_with_lax_year(self.locale_time.LC_date_time))
+ base.__setitem__(
+ 'x', self.__pattern_with_lax_year(self.locale_time.LC_date))
base.__setitem__('X', self.pattern(self.locale_time.LC_time))
def __seqToRE(self, to_convert, directive):
@@ -236,6 +238,21 @@ def __seqToRE(self, to_convert, directive):
regex = '(?P<%s>%s' % (directive, regex)
return '%s)' % regex
+ def __pattern_with_lax_year(self, format):
+ """Like pattern(), but making %y and %Y accept also fewer digits.
+
+ Necessary to ensure that strptime() is able to parse strftime()'s
+ output when the %c or %x format code is used -- considering that
+ for some locales/platforms (e.g., 'C.UTF-8' on Linux), formatting
+ with either %c or %x may cause year numbers, if a number is small,
+ to have fewer digits than usual (e.g., '999' instead of `0999', or
+ '9' instead of '0009' or '09').
+ """
+ pattern = self.pattern(format)
+ pattern = pattern.replace(self['y'], r"(?P<y>\d{1,2})")
+ pattern = pattern.replace(self['Y'], r"(?P<Y>\d{1,4})")
+ return pattern
+
def pattern(self, format):
"""Return regex pattern for the format string.
[EDIT] After applying the above patch, the error does not occur anymore:
>>> import time
>>> t_tuple = time.strptime("Tue Jan 1 00:00:00 0999", '%c')
>>> t_tuple
time.struct_time(tm_year=999, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=-1)
>>> time.strftime('%c', t_tuple)
'Tue Jan 1 00:00:00 999'
>>> time.strptime(_, '%c')
time.struct_time(tm_year=999, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=1, tm_isdst=-1)
>>>
>>> from datetime import datetime
>>> datetime(999, 1, 1).strftime('%c')
'Tue Jan 1 00:00:00 999'
>>> datetime.strptime(_, '%c')
datetime.datetime(999, 1, 1, 0, 0)
Bug report
Bug description:
Discovered this when adding some hypothesis tests for
strptime
/strftime
. I doubt this is a real problem anyone is going to have in the real world, but maybe.I do not know if this is locale-specific or OS specific.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux