python / cpython

The Python programming language
https://www.python.org
Other
62.09k stars 29.84k forks source link

datetime module has no support for nanoseconds #59648

Open 78adca90-caba-4b15-9e0c-04ae6d71ab27 opened 12 years ago

78adca90-caba-4b15-9e0c-04ae6d71ab27 commented 12 years ago
BPO 15443
Nosy @malemburg, @tim-one, @mdickinson, @abalkin, @giampaolo, @bitdancer, @andyclegg, @gareth-rees, @eli-b, @serhiy-storchaka, @pganssle, @shlomoa
PRs
  • python/cpython#21987
  • Files
  • datetime.nanosecond.patch
  • datetime.nanosecond.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = None created_at = labels = ['type-feature', 'library', '3.10'] title = 'datetime module has no support for nanoseconds' updated_at = user = 'https://bugs.python.org/goshawk' ``` bugs.python.org fields: ```python activity = actor = 'gdr@garethrees.org' assignee = 'belopolsky' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'goshawk' dependencies = [] files = ['37509', '37512'] hgrepos = [] issue_num = 15443 keywords = ['patch'] message_count = 61.0 messages = ['166326', '166331', '166333', '166335', '166336', '166338', '166340', '166345', '166361', '166364', '166383', '166385', '166386', '166387', '166414', '180125', '223039', '223042', '223066', '223068', '223071', '223073', '223074', '223075', '223077', '223078', '223080', '223082', '223083', '223106', '224360', '232952', '232962', '237338', '237807', '237809', '237819', '240243', '240244', '240290', '240291', '240292', '240294', '240299', '240398', '270266', '270535', '270885', '270886', '270887', '270888', '276748', '276749', '390474', '390479', '390483', '390486', '390491', '392382', '392418', '408859'] nosy_count = 21.0 nosy_names = ['lemburg', 'tim.peters', 'mark.dickinson', 'belopolsky', 'giampaolo.rodola', 'pythonhacker', 'Arfrever', 'r.david.murray', 'andrewclegg', 'python-dev', 'gdr@garethrees.org', 'Ramchandra Apte', 'Eli_B', 'serhiy.storchaka', 'goshawk', 'Niklas.Claesson', 'mdcb808@gmail.com', 'scoobydoo', 'tomikyos', 'p-ganssle', 'anglister'] pr_nums = ['21987'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue15443' versions = ['Python 3.10'] ```

    abalkin commented 8 years ago

    I've seen a similar glitch. Reloading the page usually fixes the problem.

    On Jul 20, 2016, at 11:37 AM, Steve Holden \report@bugs.python.org\ wrote:

    Steve Holden added the comment:

    BTW, I presume it's a bug in the issue tracker that my view of this message ends after a few lines of msg166386? Makes it rather difficult to track the issue!

    ----------


    Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue15443\


    holdenweb commented 7 years ago

    I agree on reflection that a single nanoseconds integral value makes more sense. This then requires refactoring of the existing code so that existing tests continue to pass using a microsecond property.

    Code using ONLY nanoseconds is a disjoint case, for which new tests will be required. It clearly cannot be expected to be backwards compatible with pre-implementation versions.

    Does it make sense to define behaviour for cases where the user attempts to MIX microseconds and nanoseconds? One validation I would suggest if so is that in the presence of a microseconds specification a constraint of 0 \<= nanoseconds \< 1000 must be imposed.

    abalkin commented 7 years ago

    Another advantage of a single nanoseconds field is that currently microseconds are packed in 3 bytes and nanoseconds would fit in 4 - a 1 byte increase, but to add a 0-999 field, one would need at least 2 bytes.

    abalkin commented 3 years ago

    @pganssle - let's keep the substantive discussions in the tracker so that they are not lost on github. You wrote:

    """ what is still blocking / needs to be done on this? Beta freeze for Python 3.10 is coming up at the beginning of May and I think we may have enough time to get this in before then. Probably would have been better to get it into an alpha release, but if we miss beta freeze it'll get pushed to 3.11, and I do think that nanosecond support is a desirable feature for a lot of people.

    It might be good for us to get an explicit "to-do" list of concerns to be addressed before this can be merged. """

    I don't think full nanosecond support is feasible to complete in the remaining weeks, but we can try to add nanoseconds to timedelta only. The mixed datetime + timedelta ops will still truncate, but many time-related operations will be enabled.

    I would even argue that when nanoseconds precision is required, it is more often intervals no longer than a few days and rarely a specific point in time.

    pganssle commented 3 years ago

    I don't think full nanosecond support is feasible to complete in the remaining weeks

    This may be so, but I think the important part of that question is "what work needs to be done and what questions need to be answered?" If the answer is that we need to make 3 decisions and do the C implementation, that seems feasible to do in under a month. If the answer is that we've got 10 contentious UI issues and we probably want to go through the PEP process, I agree with your assessment of the timing. Regardless, we'll need to know what work needs to be done before we do it...

    but we can try to add nanoseconds to timedelta only. The mixed datetime + timedelta ops will still truncate, but many time-related operations will be enabled. I would even argue that when nanoseconds precision is required, it is more often intervals no longer than a few days and rarely a specific point in time.

    To be honest, I don't find this very compelling and I think it will only confuse people. I think most people use timedelta to represent something you add or subtract to a datetime. Having the nanoseconds part of it truncate seems like it would be frustrating and counter-intuitive.

    From the use cases in this thread:

    So I don't think there's high enough demand for nanosecond-timedelta on its own that we need to rush it out there before datetime gets it.

    abalkin commented 3 years ago

    Is there high enough demand for nanoseconds in datetime and time instances?

    How often nanosecond timestamps contain anything other than 0s or garbage in the last three digits?

    In my experience, all people want to do with such timestamps is to convert them to something expressed in hours, minutes and seconds rather than just a huge number of seconds and back without loosing the value.

    A timedelta is almost always a decent replacement for either datetime or time in those cases and sometimes it is even preferable because arithmetically it is closer to numbers.

    9f49d929-5e8e-47c9-8670-dddc19df11d6 commented 3 years ago

    This brings me back some times. Sorry if I am not up to date, the issue as I recall from back then was there wasn't even microseconds. In telemetry, you can often have these kind time stamped measurements, it's not insignificant noise nobody cares about.

    abalkin commented 3 years ago

    In telemetry,

    a nanosecond often translates to about a foot and 5 hours gets you to Pluto. Telemetry is exactly an application where absolute timestamps rarely make any sense.

    9f49d929-5e8e-47c9-8670-dddc19df11d6 commented 3 years ago

    In the confines of PTP / IEEE1588, it's actually quite common and can be useful. It's not so much the ns, but the \<1us that is missing.

    mdickinson commented 3 years ago

    [Alexander]

    Is there high enough demand for nanoseconds in datetime and time instances?

    One need that we've encountered in real code is simply for compatibility. We have Python code that interacts with a logging web service whose timestamps include nanosecond information. Whether or not nanosecond resolution makes sense for those timestamps is a moot point: that's out of our control.

    When representing information retrieved from that web service in Python-land, we have a problem. If datetime.datetime had nanosecond precision, then using datetime.datetime to represent the retrieved values would be a no-brainer. As it is, we face a choice between:

    None of those choices are terrible, but none of them are particularly palatable compared with using a standard library solution. (FWIW, we went with option 2, returning nanoseconds since the Unix epoch as an int.)

    faa20c90-fcf0-43f4-810b-00286077d549 commented 2 years ago

    I also have a use case that would benefit from nanosecond resolution in Python's datetime objects, that is, representing and querying the results of clock_gettime() in a program trace.

    On modern Linuxes with a vDSO, clock_gettime() does not require a system call and completes within a few nanoseconds. So Python's datetime objects do not have sufficient resolution to distinguish between adjacent calls to clock_gettime().

    This means that, like Mark Dickinson above, I have to choose between using datetime for queries (which would be convenient) and accepting that nearby events in the trace may be indistinguishable, or implementing my own datetime-like data structure.

    fenchu commented 1 year ago

    More than 10 years have this been open. :-)

    strptime should be able to parse it, please harmonize strftime from other implementations with python strptime: My major problem is that porting data between Go, Java, .NET and python may change the timestamp output slightly (6digit rounding)

    Most of our rest-apis (500+) are Java spring-boot 7 digits, Go (9digits on linux and 7digits on windows) (Also .NET have 7 to 9 digits and sqlserver,oracle and db2 dumps 9 digits if needed)

    I do this this conversion almost daily:

    >>> t = '2023-01-05T09:45:41.0877981+01:00' # this example taken from K6 json output: https://k6.io/docs/results-output/real-time/json/ a Go commercial program
    >>> datetime.datetime.strptime(t, "%Y-%m-%dT%H:%M:%S.%f%z")                                   
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files\Python310\lib\_strptime.py", line 568, in _strptime_datetime
        tt, fraction, gmtoff_fraction = _strptime(data_string, format)
      File "C:\Program Files\Python310\lib\_strptime.py", line 349, in _strptime
        raise ValueError("time data %r does not match format %r" %
    ValueError: time data '2023-01-05T09:45:41.0877981+01:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'
    >>> t1 = t[:-7] + t[-6:]
    >>> datetime.datetime.strptime(t1, "%Y-%m-%dT%H:%M:%S.%f%z") 
    datetime.datetime(2023, 1, 5, 9, 45, 41, 87798, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))
    bfsoares commented 1 year ago

    More than 10 years have this been open. :-)

    strptime should be able to parse it, please harmonize strftime from other implementations with python strptime: My major problem is that porting data between Go, Java, .NET and python may change the timestamp output slightly (6digit rounding)

    Most of our rest-apis (500+) are Java spring-boot 7 digits, Go (9digits on linux and 7digits on windows) (Also .NET have 7 to 9 digits and sqlserver,oracle and db2 dumps 9 digits if needed)

    I do this this conversion almost daily:

    >>> t = '2023-01-05T09:45:41.0877981+01:00' # this example taken from K6 json output: https://k6.io/docs/results-output/real-time/json/ a Go commercial program
    >>> datetime.datetime.strptime(t, "%Y-%m-%dT%H:%M:%S.%f%z")                                   
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Program Files\Python310\lib\_strptime.py", line 568, in _strptime_datetime
        tt, fraction, gmtoff_fraction = _strptime(data_string, format)
      File "C:\Program Files\Python310\lib\_strptime.py", line 349, in _strptime
        raise ValueError("time data %r does not match format %r" %
    ValueError: time data '2023-01-05T09:45:41.0877981+01:00' does not match format '%Y-%m-%dT%H:%M:%S.%f%z'
    >>> t1 = t[:-7] + t[-6:]
    >>> datetime.datetime.strptime(t1, "%Y-%m-%dT%H:%M:%S.%f%z") 
    datetime.datetime(2023, 1, 5, 9, 45, 41, 87798, tzinfo=datetime.timezone(datetime.timedelta(seconds=3600)))

    I have this same problem. I changed _strptime.py file in lines 188 and 424, to include format n for nanoseconds (1 until 9 digits):

    Aproximately line 188 : 'f': r"(?P[0-9]{1,6})", 'n': r"(?P[0-9]{1,9})", #New format included 'H': r"(?P2[0-3]|[0-1]\d|\d)",

    Aproximately line 424:

    elif group_key == 'f': s = found_dict['f']

    Pad to always return microseconds.

    s += "0" * (6 - len(s))
    fraction = int(s)

    **elif group_key == 'n': #New format included s = found_dict['n']

    Pad to always return nanoseconds.

    s += "0" * (9 - len(s))
    fraction = int(round(int(s)/1000, 0))**            

    elif group_key == 'A':

    I´ll aprecciate if someone could update _srtptime.py with this improvements.

    adivekar-utexas commented 1 year ago

    I also have the same problem. My use-case is a generic Timer I am implementing for profiling data-processing code for high-performance Machine Learning modeling.

    This is purely in Python.

    time.perf_counter_ns() gives me a relative time in nanoseconds, and I can get the time diff as follows:

    start: int = time.perf_counter_ns()
    ...  ## Do something 
    end: int = time.perf_counter_ns()

    For certain use-cases a piece of code runs very fast (few hundred nanoseconds) but needs to run hundreds of billions of times, example dict lookup when deduplicating Amazon product embeddings.

    So, timing it effectively is useful.

    It's annoying that I can't just use timedelta(nanoseconds=end-start) to get the difference in nanoseconds, since this gets rounded to either timedelta(microseconds=0) or timedelta(microseconds=1).

    I feel this could be implemented really easily in timedelta? That would make it compatible with time.perf_counter_ns()

    douglas-raillard-arm commented 6 months ago

    Another use case: polars is currently using datetime.timedelta() objects to represent values of the Duration('ns') dtype. This allows avoiding re-inventing the wheel in all Python processing exploiting the value, but unfortunately the lack of nanosecond in timedelta() means it's a destructive operation: https://github.com/pola-rs/polars/issues/14695

    This is a situation that seems similar to @mdickinson where using the std lib would be a no-brainer if it did the job, but unfortunately it does not here and there isn't any great solution, especially if this is part of the public API of a library.

    nineteendo commented 2 months ago

    Would this be a potential solution?

    import datetime
    from datetime import time
    from operator import index
    from typing import overload
    
    class Time(time):
        @overload
        def __new__(cls, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0): ...
        @overload
        def __new__(cls, hour=0, minute=0, second=0, microsecond=0, nanosecond=0, tzinfo=None, *, fold=0): ...
    
        def __new__(cls, hour=0, minute=0, second=0, microsecond=0, nanosecond=0, tzinfo=None, *, fold=0):
            if (nanosecond is None or isinstance(nanosecond, datetime.tzinfo)) and tzinfo is None:
                # Make constructor backwards compatible
                nanosecond, tzinfo = 0, nanosecond
    
            nanosecond = index(nanosecond)
            if not 0 <= nanosecond <= 999:
                raise ValueError('nanosecond must be in 0..999', nanosecond)
    
            self = super().__new__(cls, hour, minute, second, microsecond, tzinfo, fold=fold)
            self._nanosecond = nanosecond
            return self
    
        @property
        def nanosecond(self):
            """nanosecond (0-999)"""
            return self._nanosecond

    [!NOTE] This currently doesn't raise an error:

    t = Time(nanosecond=None)

    Is it worth it to do the parsing manually, or is a warning of the type checker good enough?

    nineteendo commented 2 months ago

    Sadly passing the extra arguments of timedelta() positionally would have to be deprecated if we want the intuitive format:

    class timedelta:
        if sys.version_info >= (3, 16):
            def __new__(days=0, seconds=0, microseconds=0, nanoseconds=0, *, weeks=0, hours=0, minutes=0, milliseconds=0): ...
        elif sys.version_info >= (3, 14):
            def __new__(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0, *, nanoseconds=0): ...
        else:
            def __new__(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0): ...
    nineteendo commented 2 months ago

    @vstinner, do you think this approach is reasonable?

    vstinner commented 2 months ago

    @vstinner, do you think this approach is reasonable?

    Changing the signature in Python 3.16 to put nanoseconds instead of milliseconds is a bad idea. I don't think that positional arguments can ever change in the datetime API.

    nineteendo commented 2 months ago

    Yeah, that would break >1.1k files without raising an error: /timedelta\(\w+, \w+, \w+, \w+/.

    Let's only do it for datetime.datetime and datetime.time then, as we can make that fully backwards compatible. We could decide to deprecate passing tzinfo positionally without nanosecond though as it's a bit ugly to maintain.