toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.43k stars 846 forks source link

Feature request: Add a duration/timedelta type #514

Open JelteF opened 6 years ago

JelteF commented 6 years ago

I think it would be very useful to have a duration type natively in toml. It's a thing I use a lot in my web service configs, for cache TTL or timeouts. Right now I resort to using integers and making the key include the resolution (e.g. timeout_ms, ttl_hours). This has a couple of disadvantages:

  1. Requires extra code every time to be converted to actual language specific duration type.
  2. If the resolution chosen was wrong you have at least one of these problems:
    1. You have to change the key, resulting in backwards incompatibility
    2. You have to add zeros, which decreases readability
    3. You have to do calculations. e.g. if have ttl_hours and want 9 days you need have to enter 216. Which makes it (at least to me) not obvious when quickly looking at the config.

I would propose the following basic and IMHO natural syntax (inspired by go duration parsing/formatting):

day = 1d
hour = 1h
minute = 1m
second = 1s
milli = 1ms
micro1 = 1µs # U+00B5 = micro symbol
micro2 = 1μs # U+03BC = Greek letter mu
nano = 1ns

# allows floats
micro3 = 0.1ms

# allows combining
two_and_a_half_hours = 2h30m
# not advised but possible
five_seconds = 2s3s

# can be negative
minus_one_seconds = -1s

# allows underscores
hundred_thousand_hours = 100_000h

This notably doesn't include months and years because they can differ in duration and are quite easily approximated in days. I'm also fine with the following changes:

  1. Taking out/changing the µ for micro seconds. I think it's fine to use 0.1ms in most cases, so it's not strictly needed. I mainly put it in because Go duration parsing and formatting allows/uses it as well.
  2. I'm fine with adding a prefix to make differentiation with numbers easier. For instance D which would result in D2h30m.
  3. Removing the duplication possibility for 2s3s. Again I mainly put this in because the Go duration parsing allows it.

I really hope this is considered for inclusion as it would be really useful to me and my colleagues. (Much more so than the already supported datetime type, which I've never had an actual use for in a config).

PS. I created a modified fork https://github.com/pelletier/go-toml that supports this: https://github.com/JelteF/go-toml (see the last couple of commits)

abcdehc commented 2 years ago

agree with it whichever way it takes ! When i use datetime.timedelta in python , i have to write like this : "self.moving_validity = datetime.timedelta(minutes=self.config['param']['moving_validity_m'],seconds=self.config['param']['moving_validity_s'])" That's so inelegant

eksortso commented 2 years ago

@abcdehc I get where you're coming from. In Python it'd be nice to write self.moving_validity = self.config['param']['moving_validity'] and have a timedelta set straightaway. Short of that, the most elegant way would be to break it down a bit first. (Assuming the same config as before.)

mv_units = self.config['param']
m = mv_units['moving_validity_m']
s = mv_units['moving_validity_s']
self.moving_validity = datetime.timedelta(minutes=m, seconds=s)

And even then, Python will normalize all that to days, seconds, and microseconds anyway. Which points to the fact that TOML durations' fundamental nature has not yet been agreed upon, if it ever will be.

The timedelta documentation in Python explicitly says that seconds are stored internally, but not minutes. A TOML parser could naively lean on Python's own implementation, or it could introduce a standardized duration object that would be at odds with how timedelta works. So if we had a duration value assigned in TOML like param.moving_validity = 2m, would we end up with a timedelta of 120 seconds in our program, or a quantity that would preserve 2 exact minutes no matter what? If we added it to an aware datetime like 2016-12-31 23:59:00Z (i.e. 60 seconds before a leap second was added to the UTC timestream) in our program, would we expect to see it become 2017-01-01 00:01:00Z, or would we take the leap second into account and expect 2017-01-01 00:00:59Z instead? If we did it the Python way, we'd get that latter datetime, which is 120 seconds later in UTC.

But no proposal so far about durations in TOML has defined what units are preserved in implementation. We haven't even discussed normalization. Too much is left to the parser or the language to scrape together. It's not like how we had RFC 3339 and implementations of it to rely on for dates and times.

This is how deep this subject goes. I haven't even looked into how C++ or Golang represent their typical time duration data types or how they interoperate with time types. Is there any sort of agreed-upon standard? Is there an RFC that we could point to, to smooth this whole thing out? What is so minimal about any of these efforts? Complexity underlies the simplest implementations.

So I regret to admit, short of a well-accepted standard (sorry, ISO 8601) or implementation, that we ought to abandon time durations as being not obvious enough for the TOML standard to embrace.

arp242 commented 2 years ago

I haven't even looked into how C++ or Golang represent their typical time duration data types or how they interoperate with time types.

In Go time.Duration is just an int64 (type Duration int64), representing a number of nanoseconds.

Personally I think that's not really a show-stopper though, as leap-seconds can be ignored for many purposes (it is a show-stopper for supporting at least months though), and in practice many applications (including those written in Go, but probably also Python) already ignore leap seconds with durations since they don't contain a database of when leap seconds occurred. Event time-specific applications don't always fully implement leap seconds "the right way"; for example Google's NTP doesn't apply leap seconds, OpenBSD just pretends they don't exist, etc.

What I'm saying is that defining a "minute" to be "60 seconds" will be fine for practically all use cases, and we don't need to worry about leap seconds at all.

eksortso commented 2 years ago

@arp242 A little more comforting! But still no common standard. Seconds are the common standard, and we need millisecond precision guaranteed. If in TOML we fixed minutes to 60 seconds and hours to 3600 hours, could we confidently assert that common time durations in all languages can handle a sufficiently large number of seconds, positive or negative? And what would that limit be in order to ensure compatibility across platforms?

arp242 commented 2 years ago

For numbers TOML already specifies that "Arbitrary 64-bit signed integers should be accepted and handled losslessly"; for int64 we'd be talking about 2.9 million years, or 292 years if we allow nanoseconds. Using int64 nanoseconds probably makes sense.

marzer commented 2 years ago

@eksortso

I haven't even looked into how C++ [represents] typical time duration types

Using chrono::duration<>. Bunch of C++ template soup that ultimately distills down to a single integer or float, depending on what you want it to represent and what precision you need. Typically you'd use nanoseconds (64-bit integer backing) or milliseconds (integer of at least 45 bits, which almost always a 64-bit integer in practice).

or how they interoperate with time types.

In one of the newer versions of the standard there's new date/time types, with duration interop, but I have absolutely no idea how it works and it seems confusing as hell, tbh. All I can say is that there is some interop.

abelbraaksma commented 2 years ago

we don't need to worry about leap seconds at all.

Indeed. But just to emphasize, durations should be agnostic to leap seconds, minutes, years or even Era or calendar. That's why it's important to separate months + year and day + time. The only moment leap seconds or leap years come into play is when a duration is added to a date-time, which itself already has all the information (i.e., adding 1 month to Feb 1 2004 is 1 March 2004; adding 28 days to Feb 1 2004 is 29 Feb 2004, adding it to Feb 1 2005 is 1 March 2005).

Luckily, TOML doesn't do calculations, so we don't have to worry about that.

By making durations (which is not the same as timedelta!) agnostic of the current time, we bypass any of these potential issues and only need a very simple datatype.

eksortso commented 2 years ago

durations (which is not the same as timedelta!)

I'm used to the timedelta type in Python, which is about the equivalent of the TOML duration type that we are working to propose. I don't know what you're referring to, @abelbraaksma. But we rejected doing intervals between timestamps a long time ago.

Felk commented 2 years ago

I think he just means that if you say "1 month" that it cannot be translated to a fixed duration (say 30.43 days or something), but is only applicable in the context of a calendar (say "February 3rd" + "1 month" = "March 3rd"). And that such calculations are not TOML's responsibility.

abelbraaksma commented 2 years ago

But we rejected doing intervals between timestamps a long time ago.

@eksortso, I may be wrong. What I meant is that timedelta is a delta between two time instances (i.e, the delta between 23:59 - 0:02) and therefor potentially dependent on leap seconds and years, Era, Calendar and the like.

To me, a duration is agnostic to any time instance (i.e., 3 minutes) and therefor not dependent or influenced by such notions.

What different programming languages use for duration or timedelta or other (i.e. it could be Interval just the same) ultimately shouldn't matter for TOML, but I do think that the notion "a delta between two times" or a "duration of certain amount of seconds" are two similar, yet distinct notions.

And that such calculations are not TOML's responsibility.

Indeed @Felk, that's what I meant ;).

eksortso commented 2 years ago

For numbers TOML already specifies that "Arbitrary 64-bit signed integers should be accepted and handled losslessly";

@arp242 That specifies an expected range for integers. It's not an implementation detail necessarily. That's important to remember because smaller integer ranges might be permitted, against advice, for things like embedded systems. I think we need to keep the specification logic separate from the implementation details that a parser may use.

for int64 we'd be talking about 2.9 million years, or 292 years if we allow nanoseconds. Using int64 nanoseconds probably makes sense.

Again, we're not dictating the implementation details. But your calculations suggest a good expected range for durations. In whichever way a parser may implement a TOML duration, it would guarantee millisecond precision, even though most implementations we've seen allow for an integral number of nanoseconds. How would we state this? 290,000 years in either direction?

I think, though, that we ought to require this one thing of compliant parsers. It should be readily apparent, if not downright obvious, how a duration's value can be added to or subtracted from a timestamp's value, once each value is parsed. For instance, Python's timedelta class lives in the same standard library module as the date/time classes, which permit addition and subtraction of timedeltas from their objects. Surely other libraries have similar connections between their timestamp and duration concepts. I don't know how to codify this requirement, just that it's not a MUST, it's not as strong as a SHOULD, but it's stronger than a MAY, I think, if I may abuse the RFC 2119 terminology.

tintin10q commented 1 year ago

While I do think that this proposed by arp242 is quite elegant:

m[illi]s[econds]   milliseconds
s[econds]          seconds
m[inutes]          minutes
h[ours]            hours
d[ays]             days
w[eeks]            weeks
y[ears]            years

But I think it is a bad idea to go anywhere above days. Leap years already is already not nice but there are many ways to represent years for example See https://altalang.com/beyond-words/6-calendars-around-the-world/. Time in general is just hard.

Yes it is nice to have something like this in python:

release_date = datetime.date.today() + parsed_toml["time-till-release"]

But is that really that much better than:

release_date = datetime.date.today() + datetime.timedelta(days=parsed_toml["days-till-release"])

I don't think it warrants the extra complexity. Similar to the file sizes proposal it is just put the unit in the name.

We should really try to keep this in mind https://github.com/prettier/prettier/issues/40 as well I know it talks about formatting but still.

arp242 commented 12 months ago

One major issue any implementation will run in to with durations (but also sizes, or any other suffix) is compatibility. Consider an existing file with:

timeout = 5000  # Timeout, in ms

You upgrade to a new TOML version with durations, and you want to support:

timeout = 5s  # or 500ms, or any other duration.

Great, but ... you don't want everyone to update their config files, so timeout = 5000, with ms implied, should still work.

Turns out this is a bit tricky; in e.g. Python I guess you'll end up with:

if type(config.timeout) in [int, float]:
    # Assume ms
else:
    # datetime.timedelta

But in other more statically typed languages parsers will have to end up creating your own struct or class or whatever the language has, so you can do:

if config.timeout.DurationUnspecified() {
    // Assume ms
} else {
}

And/or maybe:

var config Config
config.timeout.Default(time.Millisecond)
toml.Decode(..)

But it all pushes some amount of complexity to both the parser and application, at least if you want v = 5 and v = 5s to both work (which you do for existing keys). You typically can't "just" use the stdlib's duration/timedelta time.

Although for new keys it's okay to only support the suffixed variant, you still want to make sure ONLY that variant is allowed. I can foresee subtle confusion with things where people do:

timeout = 5 # which means "5ms" instead of "5s" the user may have expected.

And then the application just does:

app.set_timeout(config.timeout)

And this "duck types" out alright and it "works", except it does something expected, which isn't even immediately obvious (low timeout which works fine in your local machine but times out in production ... sounds like a fun time).

I suppose type hints and the like will prevent that, and things have been moving in that direction over the last few years, but still...


Long story short, I started prototyping this in my TOML library and writing a concrete proposal, but after encountering these issues I'm less sure if we really want this.

That said, it is commonly implemented in many config files. I did a survey of some common software, based on "what I could think of" and looking at the top 500 packages in https://popcon.debian.org – this is perhaps a bit biased, and some software supports neither format (e.g. ALSA configuration has no use for either durations or sizes).

Overall, I think it's more widely supported than datetimes, which TOML already supports:

Config Time units Fractions Size units Fractions
Apache ms ⹋
Caddy ns, us/µs, ms, s, m, h, d 1.5h 1h30m 1000: kB, mB - 1024: kiB, miB ?
MariaDB For some options; but can't find docs ?
OpenSSH s, m, h, d, w 1h30m ?: K, M, G ?
Postfix s, m, h, d, w no
PostgreSQL us, ms, s, min, h, d no 1024: B, kB, MB, GB, TB (w/ fractions) 1.5M (round to K)
Redis 1000: k, m, g - 1024: kb, mb, gb ?
Varnish ms, s, m, h, d, w, y 1.5h n/a
git 1024: k, m, g no
haproxy us, ms, s, m, h, d no 1024: k, m, g (no fractions) -
nginx ms, s, m, h, d, w, M (30d), y 1h30m ?: k/K, m/M ?
samba
systemd n/a

n/a: Not applicable; there are no settings that could use this unit. †: Unit is implied. ‡: Unit is in the key name (e.g. podInitialBackoffSeconds, JobTimeoutSec). ⹋: Unit is usually implied, but a few values do allow changing the unit.

In many cases where a unit isn't supported, it would be better if it was. For example (default values):

Some are also inconsistent; e.g. Redis's lfu-decay-time is in minutes, but repl-timeout is in seconds.

Of course, this isn't TOML, but "how many TOML files actually need this?" is a bit harder to answer as it's harder to find projects which support TOML. I suppose I could check package list contents, but I haven't bothered (because that's a bit of work).

eksortso commented 11 months ago

Let's put this feature request on hold until after v1.1.0 is released.