Open 78adca90-caba-4b15-9e0c-04ae6d71ab27 opened 12 years ago
As long as computers evolve time management becomes more precise and more granular. Unfortunately the standard datetime module is not able to deal with nanoseconds even if OSes are able to. For example if i do:
print "%.9f" % time.time() 1343158163.471209049
I've actual timestamp from the epoch with nanosecond granularity.
Thus support for nanoseconds in datetime would really be appreciated
Vincenzo Ampolo wrote:
As long as computers evolve time management becomes more precise and more granular. Unfortunately the standard datetime module is not able to deal with nanoseconds even if OSes are able to. For example if i do:
print "%.9f" % time.time() 1343158163.471209049
I've actual timestamp from the epoch with nanosecond granularity.
Thus support for nanoseconds in datetime would really be appreciated
I would be interested in an actual use case for this.
On 07/24/2012 01:28 PM, Marc-Andre Lemburg wrote:
I would be interested in an actual use case for this.
Alice has a dataset with nanosecond granularity. He wants to make a python library to let Bob access the dataset. Nowadays Alice has to implement her own time class losing all the flexibility of the datetime module. With this enhancement she can provide a library that just uses the standard python datetime module. Her library will get the needed time format, including nanoseconds.
Many python sql libraries, like the one in django e the one in web2py, relay on datetime objects for time representation. Bob has a web2py website that has some data with nanosecond granularity. Nowadays Bob has to store this data as a string or a long number without the ability to use the powerful datetime module. With this enhancement Bob doesn't need to build or learn another interface, he can just use the datetime module using microseconds or nanoseconds as needed.
Google search for "python datetime nanoseconds" shows more than 141k results: https://www.google.com/search?sourceid=chrome&ie=UTF-8&q=python+time#hl=en&biw=1615&bih=938&sclient=psy-ab&q=python+datetime+nanoseconds&oq=python+datetime+nanoseconds
So this is definitively a requested feature. And as soon as technology evolves more people will ask for it.
I imagine something like:
import datetime
nano_time = datetime.datetime(year=2012, month=07, day=24, hour=14,
minute=35, second=3, microsecond=53, nanosecond=27)
in case you need nanosecond granularity. if you don't need it just skip the nanosecond part and the module works like it's now. Of course strftime format should be updated to support nanoseconds.
I can write a patch if some dev can maybe review it.
Before someone takes the datetime source code and starts a third part module that supports nanoseconds, I think this enhancement has almost null impact in existing code and makes the datetime module even more powerful. It's up to the Cpython admins to decide between maintaining datetime module up to date with new needs or let third part modules take care of those lacks.
Best Regards, -- Vincenzo Ampolo http://vincenzo-ampolo.net http://goshawknest.wordpress.com
I believe Marc-Andre was looking for an actual real-world use case rather than a hypothetical one. We discussed this briefly on the irc channel and we think Guido vetoed it on a YAGNI basis (we haven't checked the archives though...) so a real world use case is probably required.
This is a real use case I'm working with that needs nanosecond precision and lead me in submitting this request:
most OSes let users capture network packets (using tools like tcpdump or wireshark) and store them using file formats like pcap or pcap-ng. These formats include a timestamp for each of the captured packets, and this timestamp usually has nanosecond precision. The reason is that on gigabit and 10 gigabit networks the frame rate is so high that microsecond precision is not enough to tell two frames apart. pcap (and now pcap-ng) are extremely popular file formats, with millions of files stored around the world. Support for nanoseconds in datetime would make it possible to properly parse these files inside python to compute precise statistics, for example network delays or round trip times.
Other case is in stock markets. In that field information is timed in nanoseconds and have the ability to easily deal with this kind of representation natively with datetime can make the standard module even more powerful.
The company I work for is in the data networking field, and we use python extensively. Currently we rely on custom code to process timestamps, a nanosecond datetime would let us avoit that and use standard python datetime module.
Best Regards,
--- Vincenzo Ampolo http://vincenzo-ampolo.net http://goshawknest.wordpress.com
Are the nanosecond timestamps timestamps or strings? If they are timestamps it's not immediately obvious why you want to convert them to datetime objects, so motivating that would probably help. On the other hand the fact that you have an application that does so is certain an argument for real world applicability.
Even if accepted this can't get fixed in 2.7, so removing that from versions.
On 07/24/2012 04:20 PM, R. David Murray wrote:
R. David Murray \rdmurray@bitdance.com\ added the comment:
Are the nanosecond timestamps timestamps or strings? If they are timestamps it's not immediately obvious why you want to convert them to datetime objects, so motivating that would probably help. On the other hand the fact that you have an application that does so is certain an argument for real world applicability.
It depends. When they are exported for example as csv (this can be the case of market stock) or json (which is close to my case) that's a string so having a datetime object may be very helpful in doing datetime adds, subs, \<, deltas and in changing representation to human readable format thanks to strftime() without loosing precison and maintaining readability.
Think about a web application. User selects year, month, day, hour, minute, millisecond, nanosecond of an event and the javascript does a ajax call with time of this format (variant of iso8601): YYYY-MM-DDTHH:MM:SS.mmmmmmnnn (where nnn is the nanosecond representation). The python server takes that string, converts to a datetime, does all the math with its data and gives the output back using labeling data with int(nano_datetime.strftime('MMSSmmmmmmnnn')) so I've a sequence number that javascript can sort and handle easily.
It's basically the same you already do nowadays at microseconds level, but this time you have to deal with nanosecond data.
I agree with the YAGNI principle and I think that we have a clear evidence of a real use case here indeed.
Best Regards
See the PEP-410.
Vincenzo Ampolo wrote:
Vincenzo Ampolo \vincenzo.ampolo@gmail.com\ added the comment:
This is a real use case I'm working with that needs nanosecond precision and lead me in submitting this request:
most OSes let users capture network packets (using tools like tcpdump or wireshark) and store them using file formats like pcap or pcap-ng. These formats include a timestamp for each of the captured packets, and this timestamp usually has nanosecond precision. The reason is that on gigabit and 10 gigabit networks the frame rate is so high that microsecond precision is not enough to tell two frames apart. pcap (and now pcap-ng) are extremely popular file formats, with millions of files stored around the world. Support for nanoseconds in datetime would make it possible to properly parse these files inside python to compute precise statistics, for example network delays or round trip times.
Other case is in stock markets. In that field information is timed in nanoseconds and have the ability to easily deal with this kind of representation natively with datetime can make the standard module even more powerful.
The company I work for is in the data networking field, and we use python extensively. Currently we rely on custom code to process timestamps, a nanosecond datetime would let us avoit that and use standard python datetime module.
Thanks for the two use cases.
You might want to look at mxDateTime and use that for your timestamps. It does provide full C double precision for the time part of a timestamp, which covers nanoseconds just fine.
On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg \report@bugs.python.org\ wrote:
... full C double precision for the time part of a timestamp, which covers nanoseconds just fine.
No, it does not:
>>> import time
>>> t = time.time()
>>> t + 5e-9 == t
True
In fact, C double precision is barely enough to cover microseconds:
>>> t + 1e-6 == t
False
>>> t + 1e-7 == t
True
Alexander Belopolsky wrote:
>
> Alexander Belopolsky <alexander.belopolsky@gmail.com> added the comment:
>
> On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote:
>> ... full C double precision for the time part of a timestamp,
>> which covers nanoseconds just fine.
>
> No, it does not:
>
>>>> import time
>>>> t = time.time()
>>>> t + 5e-9 == t
> True
>
> In fact, C double precision is barely enough to cover microseconds:
>
>>>> t + 1e-6 == t
> False
>
>>>> t + 1e-7 == t
> True
I was referring to the use of a C double to store the time part in mxDateTime. mxDateTime uses the C double to store the number of seconds since midnight, so you don't run into the Unix ticks value range problem you showcased above.
Marc-Andre Lemburg wrote:
>
>> Alexander Belopolsky <alexander.belopolsky@gmail.com> added the comment:
>>
>> On Wed, Jul 25, 2012 at 4:17 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote:
>>> ... full C double precision for the time part of a timestamp,
>>> which covers nanoseconds just fine.
>>
>> No, it does not:
>>
>>>>> import time
>>>>> t = time.time()
>>>>> t + 5e-9 == t
>> True
>>
>> In fact, C double precision is barely enough to cover microseconds:
>>
>>>>> t + 1e-6 == t
>> False
>>
>>>>> t + 1e-7 == t
>> True
>
> I was referring to the use of a C double to store the time part
> in mxDateTime. mxDateTime uses the C double to store the number of
> seconds since midnight, so you don't run into the Unix ticks value
> range problem you showcased above.
There's enough room to even store 1/100th of a nanosecond, which may be needed for some physics experiments :-)
False
>>> x == x + 1e-10
False
>>> x == x + 1e-11
False
>>> x == x + 1e-12
True
[Roundup's email interface again...]
>>> x = 86400.0 >>> x == x + 1e-9 False >>> x == x + 1e-10 False >>> x == x + 1e-11 False >>> x == x + 1e-12 True
Have a look to this python dev mailing list thread too:
http://mail.python.org/pipermail/python-dev/2012-July/121123.html
I would like to add a real-world use case I have for nanosecond-precision support. I deal with data loggers that are controlled by GPS clocks, and I am writing some processing software in Python that requires the input of high-precision timestamps for calculating clock drifts and offsets. The addition of nanosecond-precision support in datetime would allow me to use this rather than a homebrew solution.
I would like to add a use case. Control systems for particle accelerators. We have ns, sometimes ps precision on timestamped data acquisitions and we would like to use Python to do calculations.
Given that struct timespec defined as
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
is slowly becoming the prevailing standard to represent time in system interfaces, Python's inability to faithfully store it in a high level object will increasingly become a handicap.
People are starting to put nanoseconds in their databases not because they really need such precision, but because this is what they get from their devices and at the collection time cannot do anything "smart".
The program that collects the events may simply not have time to do anything other than store raw data, or not have the higher level knowledge of what is the proper rounding.
The proper rounding is best to be done at the analysis time and by a program written in a higher level language such as Python.
For the record, numpy's datetime and timedelta types have theoretical support for attoseconds.
numpy's datetime64 and timedelta64 types are so utterly broken that I would only recommend studying them as a negative example of how not to design a date-time library.
A note from Guido, from about 2 years ago:
https://mail.python.org/pipermail/python-dev/2012-July/121127.html
""" TBH, I think that adding nanosecond precision to the datetime type is not unthinkable. You'll have to come up with some clever backward compatibility in the API though, and that will probably be a bit ugly (you'd have a microsecond parameter with a range of 0-1000000 and a nanosecond parameter with a range of 0-1000). Also the space it takes in memory would probably increase (there's no room for an extra 10 bits in the carefully arranged 8-byte internal representation). """
Add pickle, etc.
(there's no room for an extra 10 bits in the carefully arranged 8-byte internal representation)
According to a comment on top of Include/datetime.h, the internal representation of datetime is 10, not 8 bytes.
/* Fields are packed into successive bytes, each viewed as unsigned and
(if you don't trust the comments check the definitions a few lines below)
#define _PyDateTime_DATETIME_DATASIZE 10
AFAIK, Python objects are allocated with at least 32-bit alignment, so we have at least 2 unused bytes at the end of each datetime object.
Furthermore, out of 24 bits allocated for microseconds, only 20 are used, so nanoseconds can be accommodated by adding a single byte to DATETIME_DATASIZE.
Le 14/07/2014 21:37, Alexander Belopolsky a écrit :
AFAIK, Python objects are allocated with at least 32-bit alignment,
64 bits, actually, when using obmalloc.c.
Yup, it's definitely more than 8 bytes. In addition to the comments you quoted, an in-memory datetime object also has a full Python object header, a member to cache the hash code, and a byte devoted to saying whether or not a tzinfo member is present.
Guessing Guido was actually thinking about the pickle size - but that's 10 bytes (for a "naive" datetime object).
Guessing Guido was actually thinking about the pickle size
No, pickle also comes with an overhead
>>> from datetime import *
>>> import pickle
>>> t = datetime.now()
>>> len(pickle.dumps(t))
70
For the present discussion, DATETIME_DATASIZE is the only relevant number because we are not going to change anything other than the payload layout in the datetime object or its pickle serialization.
Of course pickles come with overheads too - don't be tedious ;-) The point is that the guts of the datetime pickling is this:
basestate = PyBytes_FromStringAndSize((char *)self->data,
_PyDateTime_DATETIME_DATASIZE);
That consumes exactly 10 bytes today. Add nanoseconds, and it will take at least 11 (if 4 bits are insanely squashed into the bytes currently devoted to microseconds), and more likely 12 (if nanoseconds are sanely given their own 2 bytes). I suppose another possibility is to get rid of microseconds internally, and work with a single 4-byte nanosecond member.
The following code demonstrates that we can pack year through second fields in 40 bits:
#include <stdio.h>
struct dt {
unsigned year :14; /* 1-9999 of 0-16,383 */
unsigned month :4; /* 1-12 of 0-16 */
unsigned day :5; /* 1-31 of 0-31 */
unsigned hour :5; /* 0-23 of 0-31 */
unsigned minute:6; /* 0-59 of 0-63 */
unsigned second:6; /* 0-59 of 0-63 */
/* total : 40 bits */
};
int main() {
struct dt t;
t.year = 9999;
t.month = 12;
t.day = 31;
t.hour = 24;
t.minute = 59;
t.second = 59;
printf("%d-%d-%dT%d:%d:%d in %d bytes\n",
t.year, t.month, t.day,
t.hour, t.minute, t.second,
(int)sizeof(t));
}
$ ./a.out
9999-12-31T24:59:59 in 8 bytes
Assuming 64-bit alignment, this leaves 64*2 - 40 = 88 bits for the sub-second field.
In 88 bits you can pack yoctoseconds (yocto = 1e-24) without breaking a sweat:
>>> 2**88
309485009821345068724781056
>>> 10**24
1000000000000000000000000
The practical choice is between 32-bit nanoseconds (nano = 1e-9) and 64-bit attoseconds (atto = 1e-18).
Given that the current the world record for shortest controllable time is 12 attoseconds [1], it may not be an absurd choice to go to 64 bits of sub-second precision.
On the other hand, if we manage to go from micro- to nano-seconds now, I don't think it will be hard to go to higher resolution when users start asking for it some 40 years into the future.
I suppose another possibility is to get rid of microseconds internally, and work with a single 4-byte nanosecond member.
Yes, that is what I think we should do. I would also split it from the packed fields to give it a 32-bit alignment which should improve performance.
If alignment is not required (as is the case in pickle), we can pack year through second + nanosecond fields in 9 bytes. For backwards compatibility we should continue accepting 10-byte pickles, but we can write in a new 9-byte format.
Le 14/07/2014 22:53, Tim Peters a écrit :
That consumes exactly 10 bytes today. Add nanoseconds, and it will take at least 11 (if 4 bits are insanely squashed into the bytes currently devoted to microseconds), and more likely 12 (if nanoseconds are sanely given their own 2 bytes).
That doesn't have to be. For example you could use the MSB of the microseconds field to store a "datetime pickle flags" byte, which could tell the unserializer whether a nanoseconds (or attoseconds :-)) field follows or not.
Remember that existing pickles must remain readable, so there must be some kind of version field anyway.
My patch for the issue bpo-22043 adds nanosecond precision to get the system clock.
patch in attachment is an attempt to provide the datetime type nanosecond support, handles pickle versioning, expose a new class method datetime.fromnanoseconds
minor bug fixes and improvements in new attachment.
May be instead of adding new method datetime.fromnanoseconds() make datetime.fromtimestamp() to support Decimal (and other high-precision numerical types)?
Intention of the patch was to keep it simple and limited to nanoseconds (per the report).
Throwing in Decimal would work (and possibly bring further precision) but consider:
datetime.fromnanoseconds(ns)
vs
datetime.fromtimestamp(Decimal(ts))
I find the former cleaner - sure, it adds a new class method.
Pickling is not backward compatible. I.e. older versions of Python couldn't unpickle datetime pickled in new Python.
backward compatibility is implemented as 'new python can load old pickle'. isn't it what backward compatible means?
The payload changed from 10 to 12 bytes to accomodate the nanoseconds, I don't know how to handle reverse-backward compatibility or if it's really needed.
Hi, This issue is causing my organization problems. We are using python 2.7.9 with pyodbc 3.0.7 The application DB is SQL Server and they have started using Datetime2 (see: https://msdn.microsoft.com/en-us/library/bb677335.aspx?f=255&MSPPError=-2147217396)
They did this to ensure that transactions timestamps are more unique, specially when data is bulk uploaded into the DB.
Datetime2 supports to seven places. Our application is now getting timestamps that are truncated to 6 places, making them useless when comparing a timestamp >= to others in the db.
This is a real world issue and we really need a fix. We are not able to migrate at the moment to python 3 due to other constraints. Any chance someone can take Matthieu's patch and retro fit it to 2.7.9 (if that makes sense)?
Unfortunately no, that would be a new feature and so can't go into 2.7. Maybe someone could backport the work that has been done in this area so people could patch locally, but I don't think it is a small job and I'm pretty sure no one on the core team is interested.
Although I don't know what I am doing (patching python), if someone could point me to the relevant files in 2.7.9 that need to be patched, I'm willing to see if I can do it.
Python 2 line is closed for new features, but you can start with the main line Lib/datetime.py which will probably work with python 2.7.9 after some minor tweaks. You should be able to publish the result on PyPI. Note that many new in 3.x modules are provided for 2.x this way.
Matthieu,
I don't see you adding nanoseconds to timedelta in your patch. Doesn't this mean that you would loose nanoseconds when you subtract one datetime from another?
To anyone who wants to contribute to this effort, I would recommend starting with pure python implementation in Lib/datetime.py .
Alexander,
The initial patch is indeed minimal. If it gains momentum and some level of acceptation, I'd be happy to do more amends and fixes as needed and recommended.
As for 2.7.9 - I'm not sure it makes much sense going PyPI patch if it's not going to happen on 3.x?
I have no doubt this will get into 3.x once we have a working patch and backward compatibility issues are addressed. Given the amount of effort Victor has recently put in bpo-22117 to rework CPython internal time handling to support nanoseconds, it will be odd not to support nanoseconds in datetime.
On the substance, in your patch you have chosen to add nanoseconds as a separate field instead of changing microseconds to nanoseconds. I don't think this is the right approach. See msg223082. Once you get to implementing timedelta arithmetics, you will see that carrying two subsecond fields will unnecessarily complicate the code.
My patch for the issue bpo-22043 adds nanosecond precision to get the system clock.
In fact, nanosecond resolution was added by the issue bpo-22117. In Python 3.5, datetime.datetime.now() calls _PyTime_GetSystemClock() which has now a resolution of 1 nanosecond.
The C type has a resolution of 1 nanosecond, the effictive resolution depends on the platform. For example, Windows provides GetSystemTimeAsFileTime() which has a resolution of 100 ns (and the effective accuracy is closer to 15 ms: see issue bpo-13845).
Just wanted to add a couple of comments here in case there's any interest. In our missions to make the world's market data available we deal with financial exchanges, many of whom are already recording event data at nanosecond resolution.
Further, I believe the decision to use a separate nanoseconds field to be essentially correct. While it may well introduce some arithmetical complexity its value in backwards compatibility should be regarded as paramount. If I understand it correctly, the new nanosecond resolution times would continue to be correctly handled (module loss of nanosecond resolution) when handled as current microsecond date-times.
Note that the patches attached so far to this issue are nowhere close to a complete implementation. I don't think a python-only prototype (a patch to datetime.py) would be hard to implement, but implementation would be easier if nanoseconds replaced microseconds in datetime and timedelta objects with new microsecond(s) becoming a computed property.
BTW, I presume it's a bug in the issue tracker that my view of this message ends after a few lines of msg166386? Makes it rather difficult to track the issue!
I doubt it is a bug in the tracker. I've seen that kind of thing when I am having network issues...the browser renders what it gets, and if it doesn't get it all it looks like the page ends early.
FYI, I'm seeing the same kind of odd truncation Steve sees - but it goes away if I refresh the page.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/abalkin' closed_at = None created_at =
labels = ['type-feature', 'library', '3.10']
title = 'datetime module has no support for nanoseconds'
updated_at =
user = 'https://bugs.python.org/goshawk'
```
bugs.python.org fields:
```python
activity =
actor = 'gdr@garethrees.org'
assignee = 'belopolsky'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation =
creator = 'goshawk'
dependencies = []
files = ['37509', '37512']
hgrepos = []
issue_num = 15443
keywords = ['patch']
message_count = 61.0
messages = ['166326', '166331', '166333', '166335', '166336', '166338', '166340', '166345', '166361', '166364', '166383', '166385', '166386', '166387', '166414', '180125', '223039', '223042', '223066', '223068', '223071', '223073', '223074', '223075', '223077', '223078', '223080', '223082', '223083', '223106', '224360', '232952', '232962', '237338', '237807', '237809', '237819', '240243', '240244', '240290', '240291', '240292', '240294', '240299', '240398', '270266', '270535', '270885', '270886', '270887', '270888', '276748', '276749', '390474', '390479', '390483', '390486', '390491', '392382', '392418', '408859']
nosy_count = 21.0
nosy_names = ['lemburg', 'tim.peters', 'mark.dickinson', 'belopolsky', 'giampaolo.rodola', 'pythonhacker', 'Arfrever', 'r.david.murray', 'andrewclegg', 'python-dev', 'gdr@garethrees.org', 'Ramchandra Apte', 'Eli_B', 'serhiy.storchaka', 'goshawk', 'Niklas.Claesson', 'mdcb808@gmail.com', 'scoobydoo', 'tomikyos', 'p-ganssle', 'anglister']
pr_nums = ['21987']
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue15443'
versions = ['Python 3.10']
```