python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution #55666

Closed 19e35e4c-f8b0-41ed-9bf1-d6341832c748 closed 12 years ago

19e35e4c-f8b0-41ed-9bf1-d6341832c748 commented 13 years ago
BPO 11457
Nosy @loewis, @rhettinger, @jcea, @mdickinson, @abalkin, @gustaebel, @vstinner, @larryhastings, @bitdancer, @skrah
Dependencies
  • bpo-11941: Support st_atim, st_mtim and st_ctim attributes in os.stat_result
  • Files
  • larry.decimal.utime.patch.1.txt: First revision
  • time_integer.patch
  • time_decimal.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['type-feature', 'library'] title = 'os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution' updated_at = user = 'https://bugs.python.org/khenriksson' ``` bugs.python.org fields: ```python activity = actor = 'larry' assignee = 'rhettinger' closed = True closed_date = closer = 'larry' components = ['Library (Lib)'] creation = creator = 'khenriksson' dependencies = ['11941'] files = ['23246', '24309', '24321'] hgrepos = [] issue_num = 11457 keywords = ['patch'] message_count = 65.0 messages = ['130478', '130479', '130596', '134642', '137558', '137578', '137580', '137593', '137599', '137600', '137606', '137608', '137877', '137888', '138978', '138979', '138980', '138984', '138987', '139169', '139321', '143573', '143644', '143738', '143739', '143801', '143802', '143803', '143805', '143807', '143811', '143812', '143819', '143820', '143837', '143866', '143867', '143868', '143873', '143881', '143885', '143898', '144543', '144607', '145256', '145262', '145288', '151872', '151873', '151912', '151943', '151987', '151992', '152003', '152004', '152305', '152306', '152314', '152317', '152320', '152322', '152323', '152350', '152355', '154405'] nosy_count = 16.0 nosy_names = ['loewis', 'rhettinger', 'jcea', 'mark.dickinson', 'belopolsky', 'lars.gustaebel', 'vstinner', 'larry', 'nadeem.vawda', 'Arfrever', 'r.david.murray', 'skrah', 'Alexander.Belopolsky', 'rosslagerwall', 'khenriksson', 'ericography'] pr_nums = [] priority = 'normal' resolution = 'wont fix' stage = 'test needed' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue11457' versions = ['Python 3.3'] ```

    vstinner commented 12 years ago

    Attached patch adds an optional format argument to time.time(): time.time("float") is the same than time.time(), but time.time("decimal") returns a Decimal object. The Decimal object stores the resolution of the clock and doesn't loose lower bits for big numbers. I configured the Decimal context to be able to store 10,000 years in seconds with a resolution of 1 nanosecond and ROUND_HALF_EVEN rounding method.

    Example: time.time("decimal") returns Decimal('1327495951.346024').

    It is really cool for have directly the resolution in the result, because the resolution may change at each call: time.time() has 3 different implementations (with fallbacks), each has a different resolution. time.clock() has also 2 implementations (one is used as a fallback) with different resolution.

    The internal time_to_format() takes integer arguments: the floating part is written as (floatpart, divisor).

    If you like the idea, I will also write a patch for time.clock(), time.wallclock() and time.clock_gettime(), and also maybe for time.clock_getres().

    We may use a registry to allow to add user defined formats, but I prefer to keep the patch simple (only allow "float" and "decimal" right now).

    larryhastings commented 12 years ago

    Victor: I think your patch merits its own tracker issue; it's only tangentially related to the proposed changes to os.stat.

    That said, please do add me to the nosy list if you create one.

    One more thing: I haven't given it a lot of thought, so there might be an even better API out there. But given your proposed API, wouldn't it be slightly better if it took the type object rather than the string? time.time(float) or time.time(Decimal) as examples.

    rhettinger commented 12 years ago

    Have you researched how other languages plan to expose sub-millisecond times? The isn't an API that will get points for originality. Also, it needs to be an API that is time efficient (many scripts use os.stat() frequently to scan files for changes and that check needs to be fast).

    Please do not "just check it in".

    vstinner commented 12 years ago

    Have you researched how other languages plan to expose sub-millisecond times?  The isn't an API that will get points for originality.  Also, it needs to be an API that is time efficient (many scripts use os.stat() frequently to scan files for changes and that check needs to be fast).

    Using decimal timestamps should be an option, float timestamps must remain the default.

    larryhastings commented 12 years ago

    Victor: I *think* Raymond's comments were directed at my patch, not yours.

    vstinner commented 12 years ago

    I don't like the idea of adding new fields to os.stat() *by default* because it may break backward compatibility. And if the new fields are decimal.Decimal objects, the module has to be imported and it means that any call to os.stat() would be slower just to provide timestamps with a finer resolution: this is not acceptable if you just want to check if a file exists.

    In the issue bpo-13882, I propose to add a format argument to functions getting time (time.time(), time.clock(), etc.). My patch doesn't change the default type, but add a "decimal" format to get time as a decimal.Decimal object.

    The option (format) value is a string to be able to add other formats without having to change the API later:

    For os.stat(), the optional argument can be called "timestamp".

    So if you want timestamps in the best available resolution, use timestamp="decimal". If you prefer the datetime API, use timestamp="datetime". If you don't care of timestamps, just call os.stat() without setting the timestamp option ;-)

    We might add a registry to add user-defined types, but the "tuple" format should be enough. (I don't know if we need to expose the low level "tuple" format.)

    80036ac5-bb84-4d39-8416-02cd8e51707d commented 12 years ago

    I think that one of available types of time values returned by os.stat() should allow to directly pass these values to os.futimens() and os.utimensat(), which expect (time_sec, time_nsec) tuples.

    vstinner commented 12 years ago

    I think that one of available types of time values returned by os.stat() should allow to directly pass these values to os.futimens() and os.utimensat(), which expect (time_sec, time_nsec) tuples.

    If we choose to give the possibility to get decimal.Decimal objects, we should also patch some functions to support Decimal objects, like datetime.datetime.fromtimestamp() (today, it "just works", because a Decimal object can be converted to float). We may accept Decimal in os.futimens() and os.utimensat(), but I am not sure for this particular case.

    To come back to my format solution, we can also support classical C structures to interact with C functions (in Python or more directly using ctypes):

    Or we may introduce conversion functions from other types like float, Decimal or another type.

    vstinner commented 12 years ago

    I think that one of available types of time values returned by os.stat() should allow to directly pass these values to os.futimens() and os.utimensat(), which expect (time_sec, time_nsec) tuples.

    Oh, I realized that these two functions were added to Python 3.3, so it is not too late to change their API. I would prefer to limit the number of timestamp formats: Python 3.2 has float and datetime, I (and Martin) propose to add Decimal to Python 3.3 (to get nanosecond resolution). (sec, nsec) is a new format, except if Python 3.2 has already functions expecting such tuple?

    I know that the underlying C function expects a timespec structure, but Python can try to use a higher level API, isn't it?

    Decimal is more practical than a tuple because you can just write : t2-t1 to compute a time delta. Decimal has other advantages (read the issue for the full list ;-)).

    80036ac5-bb84-4d39-8416-02cd8e51707d commented 12 years ago

    (secs, nsecs) tuples are more practical in performance-critical applications (e.g. synchronization of timestamps between 2 trees with large number of files).

    vstinner commented 12 years ago

    (secs, nsecs) tuples are more practical in performance-critical applications (e.g. synchronization of timestamps between 2 trees with large number of files).

    This is also why I propose an argument to choose the format: everyone has a different use case and use cases are incompatible. Tuples are quick to create but has not a practical API, datetime has a nice API but also issues listed before by Martin (and don't support nanosecond resolution currently), Decimal has a nice API but is expensive to create, etc.

    bitdancer commented 12 years ago

    There is also the fact that we have traditionally exposed thin wrappers around posix functions (and then were practical provided Windows emulations). We aren't 100% consistent about this, but we are pretty consistent about it.

    61337411-43fc-4a9c-b8d5-4060aede66d0 commented 12 years ago

    I know that the underlying C function expects a timespec structure, but Python can try to use a higher level API, isn't it?

    I agree entirely.

    vstinner commented 12 years ago

    I attached a more complete patch to the issue bpo-13882: it adds an optional timestamp format to os.stat(), os.lstat(), os.fstat(), os.fstatat().

    Examples:

    $ ./python 
    Python 3.3.0a0 (default:2914ce82bf89+, Jan 30 2012, 23:07:24) 
    >>> import os
    >>> s=os.stat("setup.py", timestamp="datetime")
    >>> s.st_mtime - s.st_ctime
    datetime.timedelta(0)
    >>> print(s.st_atime - s.st_ctime)
    52 days, 1:44:06.191293
    >>> os.stat("setup.py", timestamp="timespec").st_ctime
    (1323458640, 702327236)
    >>> os.stat("setup.py", timestamp="decimal").st_ctime
    Decimal('1323458640.702327236')
    larryhastings commented 12 years ago

    Given Guido's rejection of PEP-410, this won't happen, so I'm closing this bug. Our BFDL has specifically rejected any of the complicated representations; he ruled that all we need are new _ns fields representing the time in nanoseconds, and to accept a "ns=" argument for os.utime and its ilk. Please see bug bpo-14127 for discussion of that change.