python / cpython

The Python programming language
https://www.python.org
Other
63.1k stars 30.22k forks source link

PEP 410: Use decimal.Decimal type for timestamps #58090

Closed vstinner closed 12 years ago

vstinner commented 12 years ago
BPO 13882
Nosy @loewis, @mdickinson, @abalkin, @pitrou, @vstinner, @ericvsmith, @skrah
Files
  • time_decimal-18.patch
  • timestamp_datetime-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['library'] title = 'PEP 410: Use decimal.Decimal type for timestamps' updated_at = user = 'https://github.com/vstinner' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'none' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)'] creation = creator = 'vstinner' dependencies = [] files = ['24538', '24539'] hgrepos = [] issue_num = 13882 keywords = ['patch'] message_count = 38.0 messages = ['152031', '152032', '152035', '152129', '152131', '152133', '152134', '152290', '152303', '152304', '152353', '152356', '152358', '152359', '152374', '152417', '152456', '152489', '152490', '152491', '152502', '152571', '152795', '152796', '152818', '152823', '152826', '152835', '152838', '152916', '153007', '153197', '153199', '153201', '153240', '153520', '153539', '154801'] nosy_count = 9.0 nosy_names = ['loewis', 'mark.dickinson', 'belopolsky', 'pitrou', 'vstinner', 'eric.smith', 'Arfrever', 'skrah', 'Alexander.Belopolsky'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue13882' versions = ['Python 3.3'] ```

    vstinner commented 12 years ago

    Attached patch adds an optional format argument to time.time(), time.clock(), time.wallclock(), time.clock_gettime() and time.clock_getres() to get the timestamp as a different format. By default, the float type is still used, but it will be possible to pass "format" to get the timestamp as a decimal.Decimal object.

    Some advantages of using decimal.Decimal instead of float:

    Using Decimal is also motivated by the fact than Python 3.3 has now access to clock having a resolution of 1 nanosecond: clock_gettime(CLOCK_REALTIME). Well, it doesn't mean that the clock is accurate, but Python should not loose precision just because it uses floatting point instead of integers.

    About the API: I chose to add a string argument to allow maybe later to support user defined formats, or at least add new builtin formats. For example, someone proposed 128 bits float in the issue bpo-11457.

    --

    The internal format is:

    typedef struct {
        time_t seconds;
        /* floatpart can be zero */
        size_t floatpart;
        /* divisor cannot be zero */
        size_t divisor;
        /* log10 of the clock resoltion, the real resolution is 10^resolution,
           resolution is in range [-9; 0] */
        int resolution;
    } _PyTime_t;

    I don't know if size_t is big enough to store any "floatpart" value: time.clock() uses a LARGE_INTEGER internally. I tested my patch on Linux 32 and 64 bits, but not yet on Windows.

    The internal function encoding a timestamp to Decimal caches the resolution objects (10^resolution, common clock resolutions: 10^-6, 10^-9) in a dict, and a Context object.

    I computed that 22 decimal digits should be enough to compute a timestamp of 10,000 years. Extract of my patch:

    "Use 12 decimal digits to store 10,000 years in seconds + 9 decimal digits for the floating part in nanoseconds + 1 decimal digit to round correctly"

    >>> str(int(3600*24*365.25*10000))
    '315576000000'
    >>> len(str(int(3600*24*365.25*10000)))
    12

    --

    See also the issue bpo-11457 which is linked to this topic, but not exactly the same because it concerns a low level function (os.stat()).

    vstinner commented 12 years ago

    Some examples of the API:

    $ ./python 
    Python 3.3.0a0 (default:52f68c95e025+, Jan 26 2012, 21:54:31) 
    >>> import time
    >>> time.time()
    1327611705.948446
    >>> time.time('decimal')
    Decimal('1327611708.988419')
    >>> t1=time.time('decimal'); t2=time.time('decimal'); t2-t1
    Decimal('0.000550')
    >>> t1=time.time('float'); t2=time.time('float'); t2-t1
    5.9604644775390625e-06
    >>> time.clock_gettime(time.CLOCK_MONOTONIC, 'decimal')
    Decimal('1211833.389740312')
    >>> time.clock_getres(time.CLOCK_MONOTONIC, 'decimal')
    Decimal('1E-9')
    >>> time.clock()
    0.12
    >>> time.clock('decimal')
    Decimal('0.120000')
    vstinner commented 12 years ago

    Windows code (win32_clock) was wrong in time_decimal-2.patch: it is fixed in patch version 3.

    Some tests on Windows made me realize that time.time() has a resolution of 1 millisecond (10^-3) and not of a microsecond (10^-6) on Windows! It is time to use GetSystemTimeAsFileTime! => see issue bpo-13845.

    abalkin commented 12 years ago

    Can we pick an API for this functionality that does not follow the worst of design anti-patterns? Constant arguments, varying return type, hidden import, and the list can go on.

    What is wrong with simply creating a new module, say "hirestime" with functions called decimal_time(), float_time(), datetime_time() and whatever else you would like. Let's keep the good old 'time' module simple.

    pitrou commented 12 years ago

    Well, creating a separate module is an anti-pattern in itself. calendar vs. time vs. datetime, anyone? I would instead propose separate functions: decimal_time, decimal_clock... or, if you prefer, time_decimal and so on.

    5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 12 years ago

    On Fri, Jan 27, 2012 at 5:17 PM, Antoine Pitrou \report@bugs.python.org\ wrote:

    Well, creating a separate module is an anti-pattern in itself. calendar vs. time vs. datetime, anyone?

    Are you serious? Since the invention of structural programming, creating a separate module for distinct functionality has been one of the most powerful design techniques. If I recall correctly, most of the original GoF patterns were about separating functionality into a separate module or a separate class. The calendar module is indeed a historical odd-ball, but what is wrong with time and datetime?

    pitrou commented 12 years ago

    Are you serious? Since the invention of structural programming, creating a separate module for distinct functionality has been one of the most powerful design techniques.

    Yes, I'm serious, and I don't see what structural programming or design patterns have to do with it.

    And we're not even talking about "distinct functionality", we're talking about the exact same functionality, except that the return type has slightly different implementation characteristics. Doing a module split is foolish.

    vstinner commented 12 years ago

    Constant arguments

    What do you call a constant argument? "float" and "decimal"? You would prefer a constant like time.FLOAT_FORMAT? Or maybe a boolean (decimal=True)?

    I chose a string because my first idea was to add a registry to support other format, maybe user defined formats, like the one used by Unicode codecs.

    If we choose to not support other formats, but only float and decimal, a simpler API can be designed.

    Another possible format would be "tuple": (intpart: int, floatpart: int, divisor: int), a low level type used to "implement" other user-defined types. Using such tuple, you have all information (clock value and clock resolution) without losing information.

    varying return type

    I agree that it is something uncommon in Python. I know os.listdir(bytes)->bytes and os.listdir(str)->str. I suppose that there are other functions with a different result type depending on the input.

    I am not attached to my API, it was just a proposition.

    hidden import

    Ah? I wouldn't call it hidden because I don't see how a function can return a decimal.Decimal object without importing it. If you consider that it is surprising (unexepected), it can be documented.

    and the list can go on.

    What else?

    What is wrong with simply creating a new module, say "hirestime" with functions called decimal_time(), float_time(), datetime_time() and whatever else you would like.

    Hum, adding a new module would need to duplicate code. The idea of adding an argument is also to simplify the implementation: most code is shared. We can still share a lot of code if we choose to add a new function in th time module instead of adding a new argument to existing functions.

    Let's keep the good old 'time' module simple.

    What is complex in my patch? It doesn't break backward compatibility and should have a low (or null) overhead in runtime speed if the format is not set.

    --

    I notified something surprising in my patch: "t1=time.time("decimal"); t2=time.time("decimal"); t2-t1" returns something bigger than 20 ms... That's because the "import decimal" is done after reading the first clock value, and not before.

    vstinner commented 12 years ago

    Patch version 4, minor update:

    vstinner commented 12 years ago

    Another possible format would be "tuple"

    Or, I forgot an obvious format: "datetime"!

    vstinner commented 12 years ago

    Version 5:

    I am not really conviced by the usefulness of "timespec" format, but it was just an example for bpo-11457.

    The "datetime" format is surprising for time.clock() and time.wallclock(), these timestamps use an arbitrary start. I suppose that time.clock(format="datetime") and time.wallclock(format="datetime") should raise a ValueError.

    5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 12 years ago

    On Sun, Jan 29, 2012 at 6:42 PM, STINNER Victor \report@bugs.python.org\ wrote: ..

    What do you call a constant argument? "float" and "decimal"? You would prefer a constant like time.FLOAT_FORMAT? Or maybe a boolean (decimal=True)?

    Yes. This was explained on python-dev not so long ago:

    http://mail.python.org/pipermail/python-dev/2010-July/102100.html

    The problem is not with the type of the argument (although boolean or enum argument type is often a symptom pointing to the issue.) The issue is that an argument that is never given a variable value at the call site is usually a sign of an awkward API. For example, what would you prefer:

    math('log', x)

    or

    math.log(x)

    ?

    I chose a string because my first idea was to add a registry to support other format, maybe user defined formats, like the one used by Unicode codecs.

    With all my respect for MAL, codecs are not my favorite part of the python library.

    One possibility (still awkward IMO) would be to use the return type as the format specifier. This would at least require the user to import datetime or decimal before calling time() with corresponding format. Few users would tolerate I/O delay when they want to get time with nanosecond precision.

    vstinner commented 12 years ago

    One possibility (still awkward IMO) would be to use the return type as the format specifier.

    Yeah, I already thaught to this idea. The API would be:

    I have to write a function checking that obj is decimal.Decimal or datetime.datetime without importing the module. I suppose that it is possible by checking obj type (it must be a class) and then obj.__module__.

    This would at least require the user to import datetime or decimal before calling time() with corresponding format.

    Another possibility is what I proposed before in the issue bpo-11457: take a callback argument. http://bugs.python.org/issue11457#msg143738

    The callback prototype would be:

    def myformat(seconds, floatpart, divisor):
        return ...

    Each module can implements its own converter and time can provide some builtin converts (because I don't want to add something related to time in the decimal module for example).

    But I don't really like this idea because it requires to decide the API of the low level structure of a timestamp (which may change later), and it doesn't really solve the issue of "import decimal" if the converter is in the time module.

    5579dc13-9f48-42d1-bb17-9c003ef6fa70 commented 12 years ago

    On Mon, Jan 30, 2012 at 6:15 PM, STINNER Victor \report@bugs.python.org\ wrote:

    Another possibility is what I proposed before in the issue bpo-11457: take a callback argument. http://bugs.python.org/issue11457#msg143738

    I think you are over-engineering a complicated solution to a simple problem. If you want to add functionality to time module, add a function with a new name that would mimic appropriate POSIX API and return a (named) tuple of integers where C function would populate a struct. If you write a new high resolution time module - import decimal at the top of the module and make all your clocks return seconds as Decimal objects.

    vstinner commented 12 years ago

    Patch version 6:

    There are now 5 timestamp formats:

    I consider the patch as ready to be commited, or at least ready for a review ;-) There is no more FIXME or known limitation. Well, now the most important part is to decide the API and the list of timestamp formats.

    The patch should be tested on Linux, FreeBSD and Windows, 32 and 64 bits to check assertions on type sizes:

    assert(sizeof(clock_t) \<= sizeof(size_t)); assert(sizeof(LONGLONG) \<= sizeof(size_t)); assert(sizeof(time_t) \<= sizeof(PY_LONG_LONG));

    vstinner commented 12 years ago

    Hum, it looks like _PyTime_AsDecimal() is wrong is ts->divisor is not a power of 10. The exponent must be computed with Context(1), not Context(26). Something simpler can maybe be used, I don't know even the decimal API.

    vstinner commented 12 years ago

    Patch version 7:

    vstinner commented 12 years ago

    As expected, size_t is too small on Windows 32 bits.

    Patch version 8: _PyTime_t uses Py_LONG_LONG if available, instead of size_t, for numerator and denominator.

    vstinner commented 12 years ago

    (Resend patch version 8 without the git diff format to support review on Rietveld.)

    vstinner commented 12 years ago

    Oops, win32_pyclock() was disabled (for tests) in patch version 8. Fixed in version 9.

    vstinner commented 12 years ago

    Patch version 10:

    vstinner commented 12 years ago

    Even if some people dislike the idea of adding datetime.datetime type, here is a patch implementing it (it requires time_decimal-XX.patch). The patch is at least a proof-of-concept that it is possible to change the internal structure without touching the public API.

    Example:

    $ ./python
    >>> import datetime, os, time
    >>> open("x", "wb").close(); print(datetime.datetime.now())
    2012-02-04 01:17:27.593834                                                                                                                  
    >>> print(os.stat("x", timestamp=datetime.datetime).st_ctime)                                                                        
    2012-02-04 00:17:27.592284+00:00                                                                                                            
    >>> print(time.time(timestamp=datetime.datetime))                                                                                           
    2012-02-04 00:18:21.329012+00:00
    >>> time.clock(timestamp=datetime.datetime)
    ValueError: clock has an unspecified starting point
    >>> print(time.clock_gettime(time.CLOCK_REALTIME, timestamp=datetime.datetime))
    2012-02-04 00:21:37.815663+00:00
    >>> print(time.clock_gettime(time.CLOCK_MONOTONIC, timestamp=datetime.datetime))
    ValueError: clock has an unspecified starting point

    As you can see: conversion to datetime.datetime fails with ValueError('clock has an unspecified starting point') for some functions, sometimes depending on the function argument (clock_gettime).

    vstinner commented 12 years ago

    Hum, time_decimal-10.patch contains a debug message:

    + print("la")

    vstinner commented 12 years ago

    The second format string should also be updated to "U|O:stat".

    vstinner commented 12 years ago

    Updated patch (version 11).

    vstinner commented 12 years ago

    os.stat().st_birthtime should depend on the timestamp argument.

    A timestamp optional argument should also be added to os.wait3() and os.wait4() for the utime and stime fields of the rusage tuple.

    vstinner commented 12 years ago

    I created the issue bpo-13964 to cleanup the API of os.*utime*() functions.

    vstinner commented 12 years ago

    fill_time() should use denominator=1 if the OS doesn't support timestamp with a subsecond resolution. See also issue bpo-13964.

    vstinner commented 12 years ago

    Patch version 12:

    I realized that resource.getrusage() should also be modified. I will maybe do that in another version of the patch, or maybe change resource usage in another patch.

    vstinner commented 12 years ago

    Patch version 13:

    So more functions (including os.*utime*()) "accept" Decimal, but using an implicit conversion to float.

    vstinner commented 12 years ago

    Patch version 14:

    This patch gives you an overview of the whole PEP-410 implementation, but it should not be applied in one shot. It would be better to commit it step by step:

    We can start by adding the API and use it in the time module, and then rediscuss changes on other modules.

    vstinner commented 12 years ago

    Patch version 15:

    vstinner commented 12 years ago

    (Oops, I attached the wrong patch.)

    vstinner commented 12 years ago

    New try, set the version to 16 to avoid the confusion.

    test_time is failing on Windows.

    vstinner commented 12 years ago

    Here is the version 17 of my patch. This version is mostly complete and so can be reviewed. Summary of the patch:

    The patch is huge, but as I wrote before, I will split it into smaller parts:

    Changes in the version 17 of my patch:

    vstinner commented 12 years ago

    Patch version 18:

    I also updated the patch adding datetime.datetime support because some people are interested by the type, even I don't think that it is interesting to add it. datetime.datetime is only usable with time.time(), os.*stat() and time.clock_gettime(), whereas it is incompatible with all other functions.

    vstinner commented 12 years ago

    TODO:

    vstinner commented 12 years ago

    The PEP has been rejected, so I close the issue.