Closed 926c91cc-ad75-44d8-8753-2e15a016bea4 closed 9 years ago
Python is via datetime.isocalendar() able to produce the ISO week number and ISO weekday from a given date. But it is not possible to do the reverse; calculate the date from a year, ISO week number and weekday.
libc strptime()/strftime() mentions a %V and %u directive which does this, i.e. Monday in ISO week 22 of the year 2011:
datetime.strptime('2011221', '%Y%V%u')
returning 2011-05-30 and
datetime.strptime('2011227', '%Y%V%u')
returning 2011-06-05
libc (on FreeBSD) has this to say:
%V is replaced by the week number of the year (Monday as the first day of the week) as a decimal number (01-53). If the week containing January 1 has four or more days in the new year, then it is week 1; otherwise it is the last week of the previous year, and the next week is week 1.
%u is replaced by the weekday (Monday as the first day of the week) as a decimal number (1-7).
This is a reasonable request and easy to implement. I am not sure how often this functionality is needed, so I am only +0 on this feature.
At least in Denmark, week numbers are used regularly in everyday communication and planning, and the numbering follows the ISO rules. Also, the week starts on Monday.
I've recently joined the python-mentors mailing list because I love Python and want to get involved. I found this bug in the list of "Easy issues" and thought I'd try my hand. Anyway, this is my first patch, so please forgive me if I am breaking protocol or stepping on anyone's toes here. I also hope my code isn't embarrassing.
This adds a constructor to the date class that allows construction based on an ISO-8601 YYYYWWD string. Does this address the issue in a logical way?
Thanks for wanting to work on this.
It is a little hard to parse from the OP's initial post, but he is asking that %V and %u be supported by datetime.strptime, which they currently are not. strptime is the generalized constructor for turning a string into a datetime, so no, a new constructor is not the right approach.
The code that implements the strptime method of datetime is in the file _strptime.py in Lib. Your patch should be against that code. (Note that this is pretty un-obvious, I had to dig a bit to figure it out!)
Thanks, I think I understand the original post now. Upon reading the docs and code, however, it seems this is possible using the %W and %w directives. Is the issue just to support the different letters (%V and %u) specifically, or that they are not quite the same format as the corresponding ISO numbers?
Getting a date from ISO week /weekday is not possible with the %W and %w directives. ISO week numbers and normal week numbers are calculated differently (see my libc qoute in the original post). Also, using %w instead of %u would be ambiguous, since %w weeks start on Sunday, while %u start with Monday.
I agree that this should be implemented in strptime(), to follow libc strptime(). datetime() has a clear constructor while strptime() has all the whacky cases.
OK, here is my second attempt. I think it functions as desired, but a code review may reveal flaws in my implementation. I'm sure there are associated tests and documentation to write, but I have basically no experience with that yet. If this looks like the right direction, I can take it to the python-mentors list for help with the test/docs.
I'm not familiar with this module, so at the moment I just gave the patch a quick scan. The approach looks OK to me. Hopefully Alexander can review it. In the meantime I think you should go ahead and work on tests and doc. It is easier to do a detailed review when tests are included.
I'll try to give this a more thorough review by the end of the week. For now, just a nit-pick, but _calc_julian_from_V jumped at me as a really odd name. Either "iso_to_julian" or "julian_from_iso" would be much clearer. The patch also needs tests and documentation.
Thanks everyone, please take your time if there are more pressing issues; I'll get to work on tests and documentation in the mean time. I agree that '_calc_julian_from_V' is a bit strange. I was mimicking a similar helper function's name ('_calc_julian_from_U_or_W'), but perhaps that is no defense.
Also, I know the functionality is there with 'toisocalendar' and 'toisoweekday', but maybe %V and %u should be implemented for 'strftime' for completeness. Any thoughts?
On Wed, May 25, 2011 at 6:28 PM, Ashley Anderson \report@bugs.python.org\ wrote:
.. I agree that '_calc_julian_from_V' is a bit strange. I was mimicking a similar helper function's name ('_calc_julian_from_U_or_W'), but perhaps that is no defense.
This is a perfect defense. Local consistency usually trumps global conventions. I only looked at the patch and did not see that other function. Please don't change the name.
It looks like strftime already support %V and %u (presumably if and only if the platform strftime does).
Uploaded patch adding unit tests for %V and %u. Patch is against python2.7
Patch by aganders3 doesn't compile. Added comments in review of his patch. Adding a patch against python2.7 with my changes, which pass all unit tests posted previously.
Documentation (I have no idea how to create a patch for this):
%V ISO week number of the year (Monday as the first day of the week) as a decimal number (01-53). The week containing January 4 is week 1; the previous week is the last week of the previous year.
%u ISO weekday (Monday as the first day of the week) as a decimal number (1-7).
Since this is a new feature it can only go into 3.3.
Documentation is in the Doc subdirectory of a checkout, and is in the form of *.rst files. Modify the appropriate .rst file, and the doc update will be included in the patch you generate. If you wish to test your doc patch, execute 'make html' in the Doc subdirectory, which will pull in the appropriate software and build the docs in a Doc/build/html subdirectory.
Ashely's patch worked for me on 3.3 in a minimal test; I haven't looked it it further at this point.
Thanks for reviewing and for working on the unit tests and docs.
On Thu, May 26, 2011 at 9:44 AM, R. David Murray \report@bugs.python.org\ wrote: ..
Documentation is in the Doc subdirectory of a checkout, and is in the form of *.rst files. Modify the appropriate .rst file,
More specifically, strftime/strptime directives are listed in a table inside the the Doc/library/datetime.rst file. Note that the narrative before the table says that only C89 standard codes are listed and I don't think %V and %u are standard. Check if there is an applicable standard for them - C99 and POSIX define more codes that C89.
When trying to add cases for %V and %u in the tests, I ran into an issue of year ambiguity. The problem comes when the ISO year does not match the Gregorian year for a given date. I think this should be fixed by implementing the %G directive (ISO year, which is present in strftime) as well.
On Thu, May 26, 2011 at 12:07 PM, Ashley Anderson \report@bugs.python.org\ wrote:
I think this should be fixed by implementing the %G directive (ISO year, which is present in strftime) as well.
Correct.
Ashley, can you elaborate on wherein the ambiguity lies? Which combination of year, ISO week number and ISO weekday would lead to ambiguity?
Regarding documentation, the %G, %V and %u directives are listed in IEEE Std 1003.1, 2004 Edition for strftime(): http://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html
For some reason, these directives are not mentioned in the related strptime(): http://pubs.opengroup.org/onlinepubs/009695399/functions/strptime.html
The example that triggered the issue in testing was January 1, 1905. The ISO date for this day is 1904 52 7. This is reported correctly if you use datetime.isocalendar() or datetime.strftime('%G'), but you get 1905 if you use datetime.strftime('%Y'). When it's read back in it causes the test to fail because ISO '1904 52 7' is different from ISO '1905 52 7'.
Likewise if you consider a year starting on a Tuesday, Wednesday, or Thursday, there will be several days at the end of the previous year that will have an ISO year ahead of their Gregorian representation.
Oh, I see, you're talking about strftime(). You're correct that you would need %G to print the year to which the ISO week containing the specific date belongs. I see no ambiguity or need for %G in strptime().
I disagree, I think %G is necessary in strptime(). Take Monday, December 31, 2001 as an example. The ISO date is 2002 01 1.
Now, given only the Gregorian year (2001) for this date, and the ISO week and weekday (01 1), there is an ambiguity with Monday, January 1, 2001, which has an ISO date of 2001 01 1. The ISO week/weekday combination of (01 1) occurs twice in the year 2001, and can only be differentiated by the corresponding ISO year.
We can, of course, debate on what the behavior in this case should be. The way I see it, we can: 1) Assume the year taken in by %Y is equivalent to the ISO year, which it often is. Assuming %Y is the ISO year IFF %V is used accomplishes the same result. 2) Default the year to some value, currently 1900 is used when %Y is absent. This is how it is handled now. 3) Report an error/exception that insufficient data was provided, and maybe mention %G should be used instead of %Y for this case.
I'm attaching a patch now that includes some minor changes, includes %G and adds some tests. I am also working on the documentation but it's not quite ready.
I respectfully disagree. I take strptime('2002 01 1', '%Y %V %u') as mening "first day of first week in the year 2002"
There is only one date that corresponds to the first day of the first week of 2002, i.e. Dec. 31, 2001. If you specify the first day of the first week of 2001 instead, then that's another date (Jan. 1, 2001). The last and the first week in a year may span dates belonging to two years. That's just the way it is.
Are you suggesting strptime('2002 01 1', '%Y %V %u') to mean "first day of first week of 2002, except if that would change the year, in which case it means ???"
If you want to strftime(strptime(date, fmt), fmt) and arrive at the original date then yes, you need %G (but only in strftime).
Reading you comment again, I see the ambiguity now, if %Y is interpreted as "The resulting date MUST be in 2001".
I think the safest way would be to implement %G and fail if %Y is used in combination with %V. Maybe even fail if %V and %u aren't used in combination (since using %w could mean either the Sunday in the end of ISO week X or the Sunday preceeding ISO week X).
Attaching a patch for the documentation just in time for the weekend!
Ping. Is anyone willing to take this? I'm not a committer nor know anyone with commit access.
Ashley,
I would like to include your patch in 3.5. Can you combine your doc and code changes in one diff and make sure it is up to date with the tip.
Thanks.
Alexander,
http://bugs.python.org/file36417/12006_3.5_complete.patch updates the previous patches and is ready for review. Unit tests pass as of today.
Regards, Alex W.
I find this footnote somewhat confusing:
""" (8) Similar to %U and %W, %V is only used in calculations when the day of the week and the ISO year (%G) are specified when used with the strptime method. """
The existing footnote (7) is much clearer:
""" (7) When used with the strptime() method, %U and %W are only used in calculations when the day of the week and the year are specified. """
Why not use the same language in (8)?
""" (7) When used with the strptime() method, %V is only used in calculations when the day of the week and the ISO year (%G) are specified.
"""
How was "%Y %V" issue resolved? I don't see any tests for this case.
Well, it's an ambiguous situation, as established earlier. I'm not sure what the policy is wrt. foot-shooting, but I'd opt to fail if %G, %V and %u are not used together.
I don't have the repo handy to make a patch against 3.5, but would an addition like this do?
in Lib/_strptime.py:
+ elif iso_week != -1 and iso_year == -1: + raise ValueError("'%Y' directive is ambiguous in combination with '%V'. Use '%G' instead.") + elif julian == -1 and iso_week != -1 and iso_weekday != -1: + year, julian = _calc_julian_from_V(iso_year, iso_week, iso_weekday)
def test_ValueError(self):
+ # Make sure ValueError is raised when match fails or format is bad
+ self.assertRaises(ValueError, _strptime._strptime, data_string="1905 52",
+ format="%Y %V")
For future reference, here is the example showing %Y %V ambiguity:
>>> date(2013,12,31).strftime('%Y %V %u')
'2013 01 2'
>>> date(2013,1,1).strftime('%Y %V %u')
'2013 01 2'
which is resolved by using %G
>>> date(2013,12,31).strftime('%G %V %u')
'2014 01 2'
>>> date(2013,1,1).strftime('%G %V %u')
'2013 01 2'
In other words, '2013 01 2' cannot be unambiguously parsed using '%Y %V %u' format.
I think we need more tests showing that new directives don't violate strftime - strptime round-trip invariants.
Documentation should say "new in 3.5".
It would be nice to get this in for 3.5. Does anyone have time/interest to address the last two comments?
The patch is synchronized with the tip, added the versionadded directives and whatsnews entry, addressed sasha's comment on Rietveld. Fixed a bug: %a and %A now are interchangeable with %u as well as with %w.
I don't understand what more test are needed. Existing tests cover %Y %V ambiguity:
>>> date(1906, 12, 31).strftime('%G %V %u')
'1907 01 1'
>>> date(1917, 12, 31).strftime('%G %V %u')
'1918 01 1'
Addressed Berker's comments.
There are other issues.
strptime with single %Y returns a date of the first day of the year. What should return single %G?
>>> time.strptime('2015', '%Y')
time.struct_time(tm_year=2015, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=-1)
>>> time.strptime('2015', '%G')
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=-1)
See also bpo-23717 and bpo-23718.
I think POLA would be to raise ValueError on time.strptime('2015', '%G') and other situations that don't make sense or are ambiguous.
That, or reproduce the bugs that the native OS strptime() implementation has. I got the %G, %V, and %u directives accepted for the next POSIX spec (http://austingroupbugs.net/view.php?id=879). But it will be years before operating systems catch up, so I would opt for the ValueError approach instead.
Reverting to "needs patch" stage because there are still issues to be ironed out.
Wow, I can't believe this issue is so old now! I'm motivated to finally come back and address the remaining issues to get this merged. Sorry for seemingly abandoning this; I'm embarrassed I didn't push to finish it earlier.
It sounds like the consensus is to raise a ValueError in cases where ambiguity may arise, with specific suggestions for resolving the ambiguity in each case. This is in contrast to certain other cases where strptime uses some default value for missing data (e.g. month/day = 1, year = 1900).
Ambiguous or incomplete cases I have identified are:
Hopefully that covers it...let me know if I need to add any cases or change behavior in any cases. I can have a patch (at least an initial attempt) ready for this in the next few days.
Here is an updated patch with implementation as outlined in msg247525.
Thanks for the review and the good suggestions. Hopefully this new patch is an improvement.
I didn't know about the context manager for assertRaises - I was just following the format for another ValueError test a few lines above.
The frozenset and re-wrapped comment were left from playing around with another way to do the checks, and I've corrected them.
I think the conditionals around calculating the julian and year are clearer now as well.
Please (obviously) let me know if there are further changes. Also please let me know if this is not the proper way to respond to the code review!
The going's a bit tough here. I've spent at least 10 times as long on this bug than it took me to work around the fact that Python doesn't support ISO week number round-trip. Python puts a smile on my face every day and I enjoy giving back where I can. But after four years, the smile is getting a bit stiff when I look at this enhancement request.
I'm hereby offering a scrumptious, mouth-watering, virtual, blackberry mousse gateau with a honey-roasted almond meringue cake bottom to the person who gets this enhancement committed. In good open-source spirit, everyone who contributed along the way is entitled to a healthy serving and eternal gratitude from me.
Bon appetit!
Haha, thanks Erik. It seems you know my tastes enough to not offer a chocolate-based reward. =)
I was actually set to "ping" to request a patch review today, as it's been one month since my last submission. Please let me know if I need to update the patch against the current tip
.
Another *ping* for a patch review since it's been almost a month since the last one.
Ashley,
You don't have to be a committer to review a patch. Given that it is unlikely that any of the committers is an expert on ISO calendar, an external review will be most welcome. Would you complete one?
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at =
created_at =
labels = ['easy', 'type-feature', 'library']
title = 'strptime should implement %G, %V and %u directives'
updated_at =
user = 'https://bugs.python.org/ErikCederstrand'
```
bugs.python.org fields:
```python
activity =
actor = 'belopolsky'
assignee = 'belopolsky'
closed = True
closed_date =
closer = 'belopolsky'
components = ['Library (Lib)']
creation =
creator = 'Erik.Cederstrand'
dependencies = []
files = ['22113', '22125', '22126', '22137', '22240', '36417', '38585', '38587', '40085', '40113', '40660']
hgrepos = []
issue_num = 12006
keywords = ['patch', 'easy']
message_count = 56.0
messages = ['135177', '135208', '135237', '136821', '136822', '136825', '136828', '136886', '136896', '136899', '136914', '136923', '136924', '136954', '136955', '136956', '136960', '136965', '136976', '136977', '136990', '136995', '137002', '137012', '137030', '137038', '137597', '179529', '221927', '227259', '227274', '227275', '227350', '227409', '227804', '227805', '227806', '238549', '238605', '238612', '238649', '241709', '247384', '247525', '247755', '247917', '249529', '249543', '251983', '251986', '251988', '252139', '252146', '252383', '252413', '252414']
nosy_count = 12.0
nosy_names = ['tim.peters', 'belopolsky', 'r.david.murray', 'BreamoreBoy', 'Erik.Cederstrand', 'python-dev', 'AaronR', 'aganders3', 'berker.peksag', 'serhiy.storchaka', 'Erik Cederstrand', 'Alex.Willmer']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue12006'
versions = ['Python 3.6']
```