python / cpython

The Python programming language
https://www.python.org
Other
62.35k stars 29.94k forks source link

datetime lacks concrete tzinfo implementation for UTC #49344

Closed brettcannon closed 14 years ago

brettcannon commented 15 years ago
BPO 5094
Nosy @tim-one, @doerwalter, @brettcannon, @mdickinson, @abalkin, @pitrou, @devdanzin, @ezio-melotti, @merwok, @bitdancer, @durban, @4kir4
Files
  • next-patch.txt: against 2.7
  • issue5094.diff
  • issue5094a.diff
  • localtime.py: aware local time implementation
  • datetimeex.py: Python prototype for 3-argument timezone
  • issue5094b.diff
  • issue5094c.diff
  • issue5094d.diff
  • issue5094d1.diff: Fixed utcnow() documentation. No code change from issue5094d.diff.
  • issue5094e.diff
  • issue5094f.diff: 'UTC±HH:MM' and more unit tests
  • issue5094g.diff
  • issue5094h.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = created_at = labels = ['extension-modules', 'type-feature'] title = 'datetime lacks concrete tzinfo implementation for UTC' updated_at = user = 'https://github.com/brettcannon' ``` bugs.python.org fields: ```python activity = actor = 'eric.araujo' assignee = 'belopolsky' closed = True closed_date = closer = 'belopolsky' components = ['Extension Modules'] creation = creator = 'brett.cannon' dependencies = [] files = ['17455', '17525', '17533', '17541', '17544', '17569', '17570', '17585', '17586', '17631', '17648', '17657', '17680'] hgrepos = [] issue_num = 5094 keywords = ['patch'] message_count = 91.0 messages = ['80735', '80740', '80745', '80807', '80857', '80866', '81043', '81086', '81119', '81641', '81649', '82818', '106403', '106411', '106412', '106413', '106414', '106415', '106422', '106445', '106476', '106478', '106483', '106484', '106485', '106487', '106490', '106493', '106494', '106496', '106498', '106518', '106565', '106911', '106914', '106920', '106923', '106971', '106973', '106974', '106976', '106977', '106980', '106997', '106998', '107006', '107008', '107059', '107060', '107072', '107092', '107105', '107107', '107158', '107186', '107189', '107190', '107212', '107279', '107290', '107291', '107545', '107546', '107547', '107548', '107552', '107554', '107569', '107608', '107628', '107668', '107670', '107676', '107683', '107733', '107737', '107738', '107742', '107743', '107786', '107787', '107819', '107874', '107886', '107889', '107890', '107891', '107892', '107893', '107894', '107901'] nosy_count = 17.0 nosy_names = ['tim.peters', 'doerwalter', 'brett.cannon', 'mark.dickinson', 'belopolsky', 'ggenellina', 'pitrou', 'techtonik', 'ajaksu2', 'kawai', 'ezio.melotti', 'eric.araujo', 'r.david.murray', 'rafe', 'daniel.urban', 'l0nwlf', 'akira'] pr_nums = [] priority = 'high' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue5094' versions = ['Python 3.2'] ```

    brettcannon commented 14 years ago

    For the allowable range, follow the datetime docs as someone might be relying on that specification already.

    As for the ongoing DST debate, it seems we either need to say that since we cannot properly support all possible datetimes properly we should simply not even try, fixed offset or not, or we provide a class that gives the proper UTC offset, but in no way adjusts itself as people might expect or want. I'm arguing for the former, Alexander wants the latter.

    I still stand by my argument that it is not needed for the two use cases that we concretely have in the stdlib for a timezone class: a UTC instance and %z directive in strptime. Unless there is some way for the %z directive to specify that it is actually DST, I still think the functionality of the class should be kept to a functional minimum for our needs and let people needing more support, including fixed offset DST, decide how they want to handle it. I can still see a naive user thinking that DST is the same around the world and being taken by surprise when thing don't adjust accordingly by the timezone when he does timezone("PDT", 7, dst=True). Plus giving people any semblance of a DST-supporting timezone class is just going to lead for more calls of the stdlib to include concrete timezone instances that do manage DST.

    Unless other people step forward to debate this we probably are not going to reach consensus without going to python-dev to see what others think.

    abalkin commented 14 years ago

    On Fri, Jun 4, 2010 at 4:09 PM, Brett Cannon \report@bugs.python.org\ wrote: ..

    For the allowable range, follow the datetime docs as someone might be relying on that specification already.

    Will do. I think it is as simple simple s/12/24/ in two places of the current patch, but I will check and make sure it is tested by unittests.

    As for the ongoing DST debate, it seems we either need to say that since we cannot properly support all possible datetimes properly we should simply not even try, fixed offset or not, or we provide a class that gives the proper UTC offset, but in no way adjusts itself as people might expect or want. I'm arguing for the former, Alexander wants the latter.

    I disagree with your premise that "we cannot properly support all possible datetimes." If bpo-1647654 patch is applied, I can rewrite datetimeex.py so that print(localtime()) is consistent with the system date utility on Linux/MacOS/BSD for all times. This however can be achieved without DST offset in timezone object.

    The only issue is what time.strftime("%Z", localtime().timetuple()) will print. Recall that tm_isdst (the last entry in timetuple) is set according to what dst() returns. If dst() is None, tm_isdst = -1, if dst() is timedelta(0), tm_isdst = 0 and if dst() is non-zero timedelta, tm_isdst = 1. In these three cases, %Z will be formated as empty string, time.tzname[0] ('EST' in New York) and time.tzname[1] ('EDT' in New York) respectively.

    I would really like to get localtime() in datetime library at the next step. In the distant future, datetime.now() can be redefined to return aware local time. If this happens, we need to decide what localtime().dst() should return. With the current patch, it will return timedelta(0) which is wrong 50% of the time. A correct alternative is to return None, but this will be rather odd for timezone.utc.

    It appears that in order to implement dst() correctly to the letter of tzinfo specification there are really two choices:

    1. Have separate classes: UTC and FixedOffsetTimezone with different dst(). UTC().dst(dt) -> timedelta(0) and FixedOffsetTimezone(offset).dst(dt) -> None.

    2. Have a single class with dst(dt) returning fixed stored value.

    I think #2 clearly wins on simplicity.

    I still stand by my argument that it is not needed for the two use cases that we concretely have in the stdlib for a timezone class: a UTC instance and %z directive in strptime.

    I agree. The third member is needed to support aware local time, which is the third use case.

    .. I can still see a naive user thinking that DST is the same around the world and being taken by surprise when thing don't adjust accordingly by the timezone when he does timezone("PDT", 7, dst=True).

    The actual syntax would be PDT = timezone(timedelta(hours=7), "PDT", dst=timedelta(hours=1)) (or -1 - I have to check). Is the surprise you have in mind that PDT.utcoffset(dt) is the same for dt in June and January? I agree this is not ideal and I would much rather not allow timezone.utcoffset() to take an argument, but this is not possible within the tzinfo design. Best we can do is to make the argument to utcoffset() optional and naive users will not even know that they can pass time value to the timezone methods.

    I think this is still less evil than to have an aware datetime t with t.tzname() -> 'PDT' and time.strftime('%Z', t.timetuple()) -> 'PST'.

    Plus giving people any semblance of a DST-supporting timezone class is just going to lead for more calls of the stdlib to include concrete timezone instances that do manage DST.

    Again, managing DST is to use different timezone instances for different real times. It is not hard to do that. Implementing time-dependent utcoffset(dt) in a concrete tzinfo subclass is impossible unambiguously. In the long term we should be looking to deprecate the time argument to tzinfo methods.

    Unless other people step forward to debate this we probably are not going to reach consensus without going to python-dev to see what others think.

    Given the history of such discussions, I think we should reach consensus here first. I don't mind starting with a two-component timezone class and dst() returning None for all instances including timezone.utc. I still believe the cost of supporting DST is really small and adding it now is easier than in the future.

    brettcannon commented 14 years ago

    On Fri, Jun 4, 2010 at 15:17, Alexander Belopolsky \report@bugs.python.org\ wrote:

    Alexander Belopolsky \belopolsky@users.sourceforge.net\ added the comment:

    On Fri, Jun 4, 2010 at 4:09 PM, Brett Cannon \report@bugs.python.org\ wrote: .. > For the allowable range, follow the datetime docs as someone might be relying on that specification already. > Will do.  I think it is as simple simple s/12/24/ in two places of the current patch, but I will check and make sure it is tested by unittests.

    Great!

    > As for the ongoing DST debate, it seems we either need to say that since we cannot properly support all possible > datetimes properly we should simply not even try, fixed offset or not, or we provide a class that gives the proper UTC > offset, but in no way adjusts itself as people might expect or want. I'm arguing for the former, Alexander wants the latter. >

    I disagree with your premise that "we cannot properly support all possible datetimes." If bpo-1647654 patch is applied, I can rewrite datetimeex.py so that print(localtime()) is consistent with the system date utility on Linux/MacOS/BSD for all times.  This however can be achieved without DST offset in timezone object.

    Well, I think the time module needs to be slowly phased out in preference for datetime and we stop monkey-patching the time tuple over and over again. The datetime module provides a much nicer interface and TOOWTDI, but that's a long term goal.

    The only issue is what time.strftime("%Z", localtime().timetuple()) will print.  Recall that tm_isdst (the last entry in timetuple) is set according to what dst() returns.  If dst() is None,  tm_isdst = -1, if dst() is timedelta(0),  tm_isdst = 0 and if dst() is non-zero timedelta, tm_isdst = 1.  In these three cases, %Z will be formated as empty string, time.tzname[0] ('EST' in New York) and time.tzname[1] ('EDT' in New York) respectively.

    I would really like to get localtime() in datetime library at the next step. In the distant future, datetime.now() can be redefined to return aware local time.  If this happens, we need to decide what localtime().dst() should return.  With the current patch, it will return timedelta(0) which is wrong 50% of the time.  A correct alternative is to return None, but this will be rather odd for timezone.utc.

    True.

    It appears that in order to implement dst() correctly to the letter of tzinfo specification there are really two choices:

    1.  Have separate classes: UTC and FixedOffsetTimezone with different dst().  UTC().dst(dt) -> timedelta(0) and FixedOffsetTimezone(offset).dst(dt) -> None.

    2. Have a single class with dst(dt) returning fixed stored value.

    I think #2 clearly wins on simplicity.

    Or #3: We have a UTC class as originally proposed way back when. We can even make all the methods staticmethods so that I get my desire to not require instantiating an instance while you do get your pseudo-singleton you originally wanted.

    > I still stand by my argument that it is not needed for the two use cases that we concretely have in the stdlib for a > timezone class: a UTC instance and %z directive in strptime.

    I agree.  The third member is needed to support aware local time, which is the third use case.

    Sure, but we are not even there yet to consider that use case are we?

    .. I can still see a naive user thinking that DST is the same around the world and being taken by surprise when thing don't adjust accordingly by the timezone when he does timezone("PDT", 7, dst=True).

    The actual syntax would be PDT = timezone(timedelta(hours=7), "PDT", dst=timedelta(hours=1)) (or -1 - I have to check).

    Yes, I was typing from the cuff.

     Is the surprise you have in mind that PDT.utcoffset(dt) is the same for dt in June and January?

    Exactly.

    I agree this is not ideal and I would much rather not allow timezone.utcoffset() to take an argument, but this is not possible within the tzinfo design.  Best we can do is to make the argument to utcoffset() optional and naive users will not even know that they can pass time value to the timezone methods.

    This is all starting to feel like the entire tzinfo ABC needs to be re-thought and either expanded upon or replaced.

    I think this is still less evil than to have an aware datetime t with t.tzname() -> 'PDT' and time.strftime('%Z', t.timetuple()) -> 'PST'.

    > Plus giving people any semblance of a DST-supporting timezone class is just going to lead for more calls of the > stdlib to include concrete timezone instances that do manage DST. >

    Again, managing DST is to use different timezone instances for different real times.  It is not hard to do that.  Implementing time-dependent utcoffset(dt) in a concrete tzinfo subclass is impossible unambiguously.  In the long term we should be looking to deprecate the time argument to tzinfo methods.

    This is why I hate datetime stuff. =)

    > Unless other people step forward to debate this we probably are not going to reach consensus > without going to python-dev to see what others think.

    Given the history of such discussions, I think we should reach consensus here first.  I don't mind starting with a two-component timezone class and dst() returning None for all instances including timezone.utc.  I still believe the cost of supporting DST is really small and adding it now is easier than in the future.

    So how about this: we implement a UTC class now which contains only staticmethods to get you your singleton and me not having to instantiate the class. Then you take your time in possibly expanding the tzinfo class or coming up with a new ABC which replaces tzinfo that fixes all of these crazy problems. I have been arguing from a code maintainability and naive user POV, you from an experienced datetime POV. Lets solve my POV now and just start the ball rolling in solving your POV properly long-term instead of trying to shoe-horn known issues into pre-existing stuff. People complain about datetime enough, so let's take the time to fix it. The UTC class can be a stop-gap until datetime gets fixed (which could be in 3.2 if stuff happened really rapidly and the UTC solution can be ripped out; it's just code).

    9432cd65-9f88-4bc6-8ad6-72e5ab1c730d commented 14 years ago

    On Fri, Jun 4, 2010 at 11:09 PM, Brett Cannon \report@bugs.python.org\ wrote:

    Unless other people step forward to debate this we probably are not going to reach consensus without going to python-dev to see what others think.

    I would really like to see:

    1. a good definition of the problem
    2. summary of research
    3. possible solutions

    The amount of text generated in bugtracker is already overwhelming for others to plug in. I remember that my thread about datetime TZ was also hijacked into the debate that we can not ship 3rd party TZ DB with Python, so I can't even reference it.

    abalkin commented 14 years ago

    I am attaching the next version of my patch and changing priority to high because it appears that several issues depend on resolution of this one.

    Here is the summary of changes since issue5094a.diff:

    1. The constructor now accepts only whole number of minutes in [-23:59, 23:59] range.

    2. Removed tz_aware option to utcnow().

    3. Moved struct PyDateTime_TimeZone definition to .c file. This effectively makes the actual definition private so that it can be changed in the future if desired without breaking C API.

    4. Removed checks that the argument passed to tzinfo methods is a timedelta. The tzinfo spec requires that None is accepted. Since the argument is ignored, it is wasteful to check its type.

    5. Make dst() return None as recommended by tzinfo documentation when DST information is not provided.

    Brett, is this close to a compromise that we can agree to or is it time to start writing a PEP? :-)

    PS: The latest patch does not include doc changes.

    7ad756fc-130f-4966-b479-145798a9f250 commented 14 years ago

    Isn't it possible, that in the issue5094b.diff patch, in the new_timezone_ex function, in this part:

    self = (PyDateTime_TimeZone *)type->tp_alloc(type, 0);
    if (self == NULL)
        return NULL;

    should be a Py_DECREF(offset) call?

    I mean like:

    self = (PyDateTime_TimeZone *)type->tp_alloc(type, 0);
    if (self == NULL) {
        Py_DECREF(offset);
        return NULL;
    }

    I think the refcount of offset has always been incremented, so in case of an error, it should be decremented (as in the other error handling parts of the function).

    abalkin commented 14 years ago

    Good catch, Daniel!

    Fixed in issue5094c.diff.

    brettcannon commented 14 years ago

    Put yourself down as the maintainer of datetime in Misc/maintainers.rst and you have a deal, Alexander! =)

    abalkin commented 14 years ago

    Readying the patch for commit. Added documentation, tests and implementation of fromutc() method that does not need a dst().

    7ad756fc-130f-4966-b479-145798a9f250 commented 14 years ago

    In Doc/library/datetime.rst, in the documentation of datetime.utcnow this part: ".. versionchanged:: 2.7" probably should be ".. versionchanged:: 3.2".

    abalkin commented 14 years ago

    Thanks, Daniel. This was a leftover from the previous patch. The latest patch does not actually change utcnow(). Instead the users should call datetime.now(timezone.utc). Fixed in issue5094d1.diff.

    mdickinson commented 14 years ago

    I'm not sure if I missed this in the earlier discussion: is there a reason to prevent subclassing of timezone?

    mdickinson commented 14 years ago

    Some comments from playing with this patch (without having looked at the implementation):

    mdickinson commented 14 years ago

    One more: there's a footnote marked in the docs (line 36 of datetime.rst), but no corresponding footnote as far as I can see.

    mdickinson commented 14 years ago

    And another minor doc issue: the docs still say:

    "The datetime module does not supply any concrete subclasses of tzinfo."

    abalkin commented 14 years ago

    On Fri, Jun 11, 2010 at 10:23 AM, Mark Dickinson \report@bugs.python.org\ wrote: ..

    Some comments from playing with this patch (without having looked at the implementation):

    thank you very much for your comments. As we are fine-tuning the timezone class, do you think it would be helpful to go back to Brett's suggestion and add datetime.py with just from _datetime import * and keep parallel Python and C implementations of timezone class only until full datetime.py is available?

    I was deliberate in a sense that I thought about it, but there was no substantial discussion. Original patch did not allow subclassing of the UTC class and I thought it was right. (See msg106415 and msg106422.) subclassing FixedOffsetTimeZone makes more sense, but I was hoping to keep timezone complete and eventually stop supporting tzinfo API (not allow passing datetime in the methods.) More on this below. I would very much prefer not to open the subclassing debate. If you support subclassing, I will add a flag. My only objection is that it is much easier to add this functionality in the future than to remove it if it gets abused. Can we postpone this decision until real life use case comes in? (Subclass to change dst() in itself is not a use case. Why do you need legacy timetuple interface to work is the first question.)

    • If I try to do  timezone(timedelta(hours=24)), I get an error message: "ValueError: offset must be a timedelta between timedelta(1) and -timedelta(1)." and I have to think for a bit to remember that 'timedelta(1)' means 'timedelta(days=1)'.  Any chance of making this more explicit:  e.g. "between > timedelta(hours=-24) and timedelta(hours=24)"?

    Please make a call and I'll implement it. The competing spellings are:

    1. timedelta(hours=24)
    2. 24 hours (I rejected this because it may be confused with an int)
    3. timedelta(days=1)
    4. in the range [-timedelta(hours=23, minutes=59), timedelta(hours=23, minutes=59)]

    Yes, there was a reason, but I don't think it is a good one. The default implementation of fromutc() does not work if utcoffset() returns timedelta but dst() returns None. The dst() value used to detect DST ambiguities. Remember, fromutc() despite the name is defined to operate on local times and some local times are ambiguous and some are invalid. The default implementation attempts to detect that. This is the issue that I am trying to avoid in my design.

    The test is easy to add. My longer term plan is to disallow passing argument to constant methods, so adding check is somewhat wasteful. Will do.

    • And it also seems clunky that an argument *has* to be supplied for utcoffset, tzname and dst, only to be i gnored.  Would it be possible to make the argument optional?

    Yes. this was my transition plan. Make it optional. Modify datetime methods to first try to get offset/name/dst without passing time argument and fall back to passing self. Remove optional argument from timezone methods.

    Yes. I want parseable __repr__ as well. I've been torn between 'datetime.timezone(datetime.timedelta(..)[, ..])' and 'timezone(timedelta(..)[, ..])'. Opinions welcome. Similarly, should str(timezone.utc) be '+0000' or 'UTC' or '+00:00'?

    You are reading my mind. I planned to tear out conditional logic that supports plain 'UTC', but apparently forgot. Will do.

    • I'm very confused about utcoffset:  why can't I supply a UTC datetime (i.e. an aware datetime with tzinfo = timezone.utc) to this?  I suspect I'm misunderstanding something here...

    You are not alone! :-) My understanding is that utcoffset was designed around the notion that if you add a day to 12 noon the day before DST change, you should still get 12 noon the next day. Supporting this requires deriving local timezone from local time which is impossible for some datetime values. Most business applications can ignore this because nobody schedules meetings between 1 and 2 am and those who do are smart enough to catch an exception and ask the user to clarify what time he/she means.

    Good catch. Thanks for the review.

    abalkin commented 14 years ago

    I have to stop replying to emails. There is no reason behind roundup remove ">" comments logic. Reposting my message:

    """ On Fri, Jun 11, 2010 at 10:23 AM, Mark Dickinson \report@bugs.python.org\ wrote: ..

    Some comments from playing with this patch (without having looked at the implementation):

    thank you very much for your comments. As we are fine-tuning the timezone class, do you think it would be helpful to go back to Brett's suggestion and add datetime.py with just from _datetime import * and keep parallel Python and C implementations of timezone class only until full datetime.py is available?

    • As noted above, the 'timezone' class can't be subclassed. Is this deliberate? I notice that Brett said "let users subclass as needed to add DST support" in msg107008.

    I was deliberate in a sense that I thought about it, but there was no substantial discussion. Original patch did not allow subclassing of the UTC class and I thought it was right. (See msg106415 and msg106422.) subclassing FixedOffsetTimeZone makes more sense, but I was hoping to keep timezone complete and eventually stop supporting tzinfo API (not allow passing datetime in the methods.) More on this below. I would very much prefer not to open the subclassing debate. If you support subclassing, I will add a flag. My only objection is that it is much easier to add this functionality in the future than to remove it if it gets abused. Can we postpone this decision until real life use case comes in? (Subclass to change dst() in itself is not a use case. Why do you need legacy timetuple interface to work is the first question.)

    • If I try to do timezone(timedelta(hours=24)), I get an error message: "ValueError: offset must be a timedelta between timedelta(1) and -timedelta(1)." and I have to think for a bit to remember that 'timedelta(1)' means 'timedelta(days=1)'. Any chance of making this more explicit: e.g. "between > timedelta(hours=-24) and timedelta(hours=24)"?

    Please make a call and I'll implement it. The competing spellings are:

    1. timedelta(hours=24)
    2. 24 hours (I rejected this because it may be confused with an int)
    3. timedelta(days=1)
    4. in the range [-timedelta(hours=23, minutes=59), timedelta(hours=23, minutes=59)]
    • The existing docs say, at one point: "if utcoffset does not return None, dst() should not return None either." And yet it seems that this is exactly what happens for timezone instances: > utcoffset doesn't return None, but dst does. Was there a reason for the explicit restriction in the docs, and are we sure that that reason is no longer valid?

    Yes, there was a reason, but I don't think it is a good one. The default implementation of fromutc() does not work if utcoffset() returns timedelta but dst() returns None. The dst() value used to detect DST ambiguities. Remember, fromutc() despite the name is defined to operate on local times and some local times are ambiguous and some are invalid. The default implementation attempts to detect that. This is the issue that I am trying to avoid in my design.

    • I find it strange that mytimezone.utcoffset(1+3j) works; similarly for tzname and dst. Perhaps it should be checked at least that the argument is a datetime. Similarly for tzname and dst.

    The test is easy to add. My longer term plan is to disallow passing argument to constant methods, so adding check is somewhat wasteful. Will do.

    • And it also seems clunky that an argument *has* to be supplied for utcoffset, tzname and dst, only to be ignored. Would it be possible to make the argument optional?

    Yes. this was my transition plan. Make it optional. Modify datetime methods to first try to get offset/name/dst without passing time argument and fall back to passing self. Remove optional argument from timezone methods.

    I can make the arg optional now.

    • Any chance of a nice __str implementation for timezone instances? (And/or possibly a nice __repr as well)?

    Yes. I want parseable __repr__ as well. I've been torn between 'datetime.timezone(datetime.timedelta(..)[, ..])' and 'timezone(timedelta(..)[, ..])'. Opinions welcome. Similarly, should str(timezone.utc) be '+0000' or 'UTC' or '+00:00'?

    • The docs for tzname are misleading: they claim that the default name has the form "UTCsHHMM". This isn't true for UTC+0, whose name seems to be just "UTC". It actually wouldn't seem unreasonable to have this print as "UTC+0000", just for consistency (and for ease of parsing for anyone on the receiving end of such a string). Or the docs could be fixed.

    You are reading my mind. I planned to tear out conditional logic that supports plain 'UTC', but apparently forgot. Will do.

    • I'm very confused about utcoffset: why can't I supply a UTC datetime (i.e. an aware datetime with tzinfo = timezone.utc) to this? I suspect I'm misunderstanding something here...

    You are not alone! :-) My understanding is that utcoffset was designed around the notion that if you add a day to 12 noon the day before DST change, you should still get 12 noon the next day. Supporting this requires deriving local timezone from local time which is impossible for some datetime values. Most business applications can ignore this because nobody schedules meetings between 1 and 2 am and those who do are smart enough to catch an exception and ask the user to clarify what time he/she means.

    • In the docs, replace "timezeone" with "timezone"

    Good catch. Thanks for the review.

    abalkin commented 14 years ago

    I am attaching a new patch, issue5094e.diff which addresses most of Mark's comments. I left out repr() because two opinions were voiced on IRC with respect to datetime. prefix. I would like to give it some more thought even though I am leaning towards compatibility with existing reprs.

    I did not make td argument optional and did not allow timezone o be subclassed because these seem to be mutually exclusive options. (If td is optional in base class, it must be optional in subclasses per Liskov's principle severely limiting utility of subclasses.) Let's address this separately.

    7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 14 years ago

    Minor notes:

    msg107186:

    1. The constructor now accepts only whole number of minutes in [-23:59, 23:59] range.

    rfc 3339 provides the following example:

    1937-01-01T12:00:27.87+00:20

    This represents the same instant of time as noon, January 1, 1937, Netherlands time. Standard time in the Netherlands was exactly 19 minutes and 32.13 seconds ahead of UTC by law from 1909-05-01 through 1937-06-30. This time zone cannot be represented exactly using the HH:MM format, and this timestamp uses the closest representable UTC offset.

    The presence of fractions of seconds in time zone is an exception so it might not be worth to support it but it exists.

    msg107552: Similarly, should str(timezone.utc) be '+0000' or 'UTC' or '+00:00'?

    Excerpts in favor for '+00:00' from rfc 3339:

    Attempts to label local offsets with alphabetic strings have resulted in poor interoperability in the past [IMAIL], [HOST-REQ]. As a result, RFC2822 [IMAIL-UPDATE] has made numeric offsets mandatory.

    If the time in UTC is known, but the offset to local time is unknown, this can be represented with an offset of "-00:00". This differs semantically from an offset of "Z" or "+00:00", which imply that UTC is the preferred reference point for the specified time.

    abalkin commented 14 years ago

    There is a separate issue bpo-5288 asking to support sub-minute offsets. This is not hard, but the C code still has a few interfaces left from the time when offset was an integer # of minutes. I am +1 to fix that, but not as a part of this issue.

    On str(tz), I definitely want an invariant str(tz) == tz.tzname(None). I am open to changes to tzname(), but we are very close to bikesheding here. Let's settle for 'UTC±HH:MM'. This seems to be the most common spelling for numeric timezones in various tables on the web.

    abalkin commented 14 years ago

    I have made my mind on subclassing timezone issue. I believe subclassing should not be allowed and here is the reason:

    The new datetime.timezone class is a very specific implementation of tzinfo interface. It guarantees that utcoffset(dt) and friends return the same value for all times. Applications should be able to rely on this and a simple isinstance(tz, timezone) should be enough to deduce that tz is a fixed offset tzinfo instance. (Of course careful applications can check type(tz) == timezone instead, but making isinstance(tz, timezone) subtly wrong is not a good idea.)

    There is also very little to be gained from subclassing timezone. It defines only 4 methods and each is entirely trivial. Any non-trivial tzinfo implementation will need to override all 4 of them anyways. It is much safer to derive from tzinfo directly and possible reuse timezone through containment (for example to format offsets in a standard way.)

    The only immediate application of subclassing is to add dst() method returning a fixed offset rather than None. This is trivial to implement through containment and if such class becomes popular, I would prefer to add this functionality to timezone class itself.

    mdickinson commented 14 years ago

    I agree it seems fine to disallow subclassing of timezone. (And if we decide, in the light of new information, that that was wrong, then it's easy to allow subclassing later; harder to disallow it.)

    +1 for 'UTC±HH:MM' for both tzname and __str__, too.

    I'll take a look at the newest patch.

    abalkin commented 14 years ago

    +1 for 'UTC±HH:MM' for both tzname and __str__, too.

    It looks like I subconsciently allocated enough characters for this in the string buffer, so now this is a single character change. I'll do it with other changes (if any) before commit.

    abalkin commented 14 years ago

    Attaching issue5094f.diff which implements 'UTC±HH:MM' and adds unit tests for more error cases.

    mdickinson commented 14 years ago

    The latest patch looks good to me. I only have minor comments, in random order:

        self = (PyDateTime_TimeZone *)type->tp_alloc(type, 0);

    to make it:

        self = (PyDateTime_TimeZone *)(type->tp_alloc(type, 0));
    abalkin commented 14 years ago

    On Sun, Jun 13, 2010 at 10:09 AM, Mark Dickinson \report@bugs.python.org\ wrote: ..

    • Should the PyDateTime_TimeZone struct definition go into datetime.h, so that it's avaiable if you want to export any C-API functions later on?

    The original patch had this in the header file, but I moved it down to .c during dst debate. (See msg107186, point 3.) In the future we may add accessor functions to datetime.h, but this can be done without exposing the C struct.

    • If you're not allowing subclassing, then presumably you don't need  the new_timezone / new_timezone_ex dance?

    I agree, but if you don't mind, I will not change it at this point. There is a small chance that there will be an outcry for supporting subclassing and we will put that back in. Also, issue bpo-2267 debate may result in removing the dance from other factory functions. (If the camp saying that the base class has no idea how to construct subclass instances wins.) Not a big deal either way, though.

    • For clarity, please consider adding parentheses in:

       self = (PyDateTime_TimeZone *)type->tp_alloc(type, 0);

    will do. PEP-7 does not mention that, but probably should.

    • Whitespace issues: there are a couple of tabs in the source (at around lines 810, 3388, 3390), and an overly long line (>79 characters) at around line 3365.

    Thanks. I thought I checked those. Maybe I should take over bpo-8912 as a community service. :-)

    • Please add a brief comment before the added C functions (like new_timezone_ex) explaining their purpose.

    Will do.

    • I wonder whether __ne__ should return the correct result (rather than returning NotImplemented) for timezone instances.

    Will do.

     OTOH, I agree with the decision not to allow timezone order comparisons (though I bet they get requested at some point).

    I will support such request. It is very easy to implement (just remove the check that disallows it) and makes perfect sense. I left them out only because it is easier to add features than to remove in the future and I don't want to debate this now.

    • There doesn't seem to be any mention of timezone.min or timezone.max in the docs.  Is this deliberate?

    Yes. Without ordering, having min and max is rather strange. I wanted to expose these for testing and will definitely document them when (and if) order comparisons are implemented. For now, let's keep them as easter eggs.

    abalkin commented 14 years ago

    Just to add a little bit of historical perspective, this proposal is really more than seven years old:

    """ s.keim (Dec 20, 2002 3:33 am; Comment #13)

    .. do we really need methods like utcnow. I believe the following could be a little more easy to learn:

    Apparently, this proposal have been forgotten and reinvented again in msg106411, point 2.

    abalkin commented 14 years ago

    issue5094g.diff addresses all Mark's suggestions except making struct definition public. I also made a few other changes:

    1. Constructor now raises TypeError when offset is not a timedelta instead of ValueError in previous version.

    2. NEWS entry no longer says that argument is ignored by tzinfo methods implemented in timezone. (Since these methods now check the type of the argument, it would be misleading to say that arg is ignored.)

    3. Added some more tests for error cases.

    4. Added ACK for Rafe Kaplan as the author of the original patch.

    abalkin commented 14 years ago

    Also I changed Py_DECREFs in destructor to Py_CLEAR. I understand that while not strictly required in this case, it is a good practice.

    mdickinson commented 14 years ago

    issue5904g.diff looks good to me. A very nice piece of work!

    abalkin commented 14 years ago

    Committed in r81981. I'll open a separate documentation issue to update Doc/includes/tzinfo-examples.py and improve TZ related documentation.

    brettcannon commented 14 years ago

    Thanks for sticking with this, Alexander. I realized I was a slight pain to deal with on this one. =)

    abalkin commented 14 years ago

    Reopening to add some minor fixes to tests and documentation. See issue5094h.diff. Ezio, thanks for finding these issues.

    merwok commented 14 years ago

    +1 for replacing math range notation with English. Much easier for non-math people that do Python :)

    One more nit: Your docstrings use verb forms like “Returns” where PEP 257 advises to use “Return”: “[the docstring] prescribes the function or method's effect as a command ("Do this", "Return that"), not as a description; e.g. don't write "Returns the pathname ..."”

    abalkin commented 14 years ago

    What about

    """ .. method:: datetime.utcoffset()

    If :attr:`tzinfo` is ``None``, returns ``None``, else returns ... """

    Should this use "return" too?

    merwok commented 14 years ago

    .. method:: datetime.utcoffset()

    Return ``None`` if :attr:`tzinfo` is ``None``, else else return ...

    That said, to keep diffs readable, perhaps stick with the existing convention know, and later a PEP-257 compliance diff can be made.

    abalkin commented 14 years ago

    How to specify offset range horse got its beating above. See msg107554. The current wording is the best compromise between verbosity and precision.

    Replacing the patch with one that fixes other nits.

    merwok commented 14 years ago

    Suggestion for one line: “The dt argument must be aware with tzinfo set to self.” “The dt argument must be an aware datetime, with tzinfo set to self.”

    If there is a reST construct for aware datetime, use it. Since “self” is not special, perhaps an alternate wording like “the timezone instance from which the method is called”, but less ugly.

    Some PyDoc_STR calls still use “Returns.”

    By the way, you could have committed the unittest changes directly.

    9432cd65-9f88-4bc6-8ad6-72e5ab1c730d commented 14 years ago

    Good job. Thanks for working on this.

    It is possible to backport this to future of Python 2.7?

    pitrou commented 14 years ago

    It is possible to backport this to future of Python 2.7?

    No.

    abalkin commented 14 years ago

    On this happy note, I am closing this issue. Doc changes have been committed in r82004, test changes in r82003. For repr(timezone(..)) development, please follow issue bpo-9000.