toml-lang / toml

Tom's Obvious, Minimal Language
https://toml.io
MIT License
19.43k stars 846 forks source link

Feature request: Add a duration/timedelta type #514

Open JelteF opened 6 years ago

JelteF commented 6 years ago

I think it would be very useful to have a duration type natively in toml. It's a thing I use a lot in my web service configs, for cache TTL or timeouts. Right now I resort to using integers and making the key include the resolution (e.g. timeout_ms, ttl_hours). This has a couple of disadvantages:

  1. Requires extra code every time to be converted to actual language specific duration type.
  2. If the resolution chosen was wrong you have at least one of these problems:
    1. You have to change the key, resulting in backwards incompatibility
    2. You have to add zeros, which decreases readability
    3. You have to do calculations. e.g. if have ttl_hours and want 9 days you need have to enter 216. Which makes it (at least to me) not obvious when quickly looking at the config.

I would propose the following basic and IMHO natural syntax (inspired by go duration parsing/formatting):

day = 1d
hour = 1h
minute = 1m
second = 1s
milli = 1ms
micro1 = 1µs # U+00B5 = micro symbol
micro2 = 1μs # U+03BC = Greek letter mu
nano = 1ns

# allows floats
micro3 = 0.1ms

# allows combining
two_and_a_half_hours = 2h30m
# not advised but possible
five_seconds = 2s3s

# can be negative
minus_one_seconds = -1s

# allows underscores
hundred_thousand_hours = 100_000h

This notably doesn't include months and years because they can differ in duration and are quite easily approximated in days. I'm also fine with the following changes:

  1. Taking out/changing the µ for micro seconds. I think it's fine to use 0.1ms in most cases, so it's not strictly needed. I mainly put it in because Go duration parsing and formatting allows/uses it as well.
  2. I'm fine with adding a prefix to make differentiation with numbers easier. For instance D which would result in D2h30m.
  3. Removing the duplication possibility for 2s3s. Again I mainly put this in because the Go duration parsing allows it.

I really hope this is considered for inclusion as it would be really useful to me and my colleagues. (Much more so than the already supported datetime type, which I've never had an actual use for in a config).

PS. I created a modified fork https://github.com/pelletier/go-toml that supports this: https://github.com/JelteF/go-toml (see the last couple of commits)

forember commented 6 years ago

A prefix is a good idea, especially if #427 gets accepted. Microseconds could be represented with us. And TOML tends to be pretty strict about formatting (to reduce the chance of confusion when reading a doc), so combining should probably require descending order of units with no duplication.

JelteF commented 6 years ago

@NighttimeDriver50000, thanks for the input. The us suffix sounds like a good idea indeed, much easier to type than µs. And the reasoning for the other two points make sense as well.

jongiddy commented 6 years ago

The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation and RFC5545 defines a subset.

This would make your example:

day = P1D
hour = PT1H
minute = PT1M
second = PT1S
milli = PT0.001S
micro = PT0.000001S
nano = PT0.000000001S

# allows floats
micro3 = PT0.001S

# allows combining
two_and_a_half_hours = PT2H30M
# not supported
five_seconds = PT2S3S

# can be negative
minus_one_seconds = -PT1S

# allowing underscores would be a non-standard extension
hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix makes parsing simpler and keeps more space for other types that may one day be added.

The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores.

pnathan commented 6 years ago

I would agree that using the previously baked standard would be a very good idea. Java 8 supports parsing the ISO into Durations, which is a Very Nice Thing, since I don't have to write the parser.

I'm sure other languages & libraries would also tend to the ISO direction to some degree.

And TBH, I also agree this would be a good data type to have. Not sure it's minimal, but certainly useful.

On Sun, May 20, 2018, 12:00 AM jongiddy notifications@github.com wrote:

The date-time type is derived from RFC3339, which is a subset of ISO8601. It would be great to define any other time types using similar standards. ISO8601 has a duration representation https://en.wikipedia.org/wiki/ISO_8601#Durations and RFC5545 https://tools.ietf.org/html/rfc5545#section-3.3.6 defines a subset.

This would make your example:

day = P1D hour = PT1H minute = PT1M second = PT1S milli = PT0.001S micro = PT0.000001S nano = PT0.000000001S

allows floats

micro3 = PT0.001S

allows combining

two_and_a_half_hours = PT2H30M

not supported

five_seconds = PT2S3S

can be negative

minus_one_seconds = -PT1S

allowing underscores would be a non-standard extension

hundred_thousand_hours = PT100_000H

The benefit is the use of a recognised standard, and that the P prefix makes parsing simpler and keeps more space for other types that may one day be added.

The downsides are that sub-second units are only supported as decimals (while ISO8601 supports decimals for any time unit, RFC5545 doesn't allow them at all), and the decimals do not support underscores.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/toml-lang/toml/issues/514#issuecomment-390462120, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuc4JX4NPOa0MU-Ln2Y5q_2wfnIzWWMks5t0RSHgaJpZM4Rdtos .

JelteF commented 6 years ago

Good to see some activity on this issue again. Usually I agree that using standards is preferable. However, I don't agree that ISO8601 or RFC5545 would be a better fit for this then something similar to the Go duration parsing. With the following reasons:

  1. Having sub second resolution is really nice for defining short timeouts.
  2. Days and weeks in those standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.
  3. The use of capital letters makes it harder to see the units at a glance.
  4. Requiring a T between the days and time adds extra clutter for no good reason (unless you allow M for months, which is not consistent in number of days)
  5. I've never seen anybody use this standard, which suggests that people don't really like it. (yes this is not scientific of course, feel free to dispute this)

Fixing some of these is of course possible, but that would result in a custom standard. So it would lose the benefit of using the standard.

PS. I'm fine with having the P prefix (or any other one to avoid confusion with the standards). So don't take that as a reason to prefer the standard.

Falkon1313 commented 6 years ago

A suggestion: a simple quantity unit may be better than trying to specify all possibilities, and/or a limited subset with arbitrary exclusions.

This notably doesn't include months and years because they can differ in duration and are quite easily specified in days.

But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

And time is not the only measurement, they all have units. Whether it be disk space/RAM, distance, volume, temperature, currency, etc.

A more general quantity unit measurement type would cover every possible combination, without having to specify them all or include all the conversion factors. Actually interpreting them (and converting if necessary) isn't really the job of a configuration or data file parser, that belongs to the application interpreting the configuration or data.

I like the idea of the combining - you could express something like 8 lb 5 oz or 2 months 3 days.

But generally, this sort of thing belongs more in the domain of the consuming application, not the data file format. And all measurements could be expressed as strings and handled by the application as appropriate for the application. What is the benefit of making it more complicated?

JelteF commented 6 years ago

@Falkon1313 I agree that a general quantity unit should be part of the the consuming application. However, there's a big difference between durations and the other quantities you mention: There's a standard library type for durations in almost every programming languague (at least the ones that also have a datetime type, which is already part of the toml spec). And like I said before, the main advantage would be to directly generate that type.

But that is exactly why you can't specify them in days. If something is due every 3 months, how do you express that? Or that something has been owned for 5 years?

Yes I messed up there. I meant to say they are quite easily approximated in days. (fixed up that comment now)

Hugo-Trentesaux commented 5 years ago

Maybe we should just add a type which is a pair (number, string), the unit of the number being represented in the string. It would be very useful for any physical quantity.

lenght = 1 mm
size = 720 px
duration = 1 day

But the question is how it would be loaded in the program so that we can not add 1 apple with 2 watermelons...

willstott101 commented 5 years ago

I personally think generic quantities are best parsed and considered by language and/or domain specific libraries. Any number with a unit may well get coerced to a native type differently depending on that language's native type, and the application's specific understanding of the units of a given domain. For instance a language's Decimal type might be most appropriate when handling quantities (especially currencies).

An example of a quantity library which I think handles them well is Python's Pint, which keeps original strings around as long as possible (extremely useful for any kind of user input, including configs): https://pint.readthedocs.io/en/0.9/

But because durations are so often representable in standard libraries, and so ubiquitous when configuring services, I think it makes a lot of sense to have an intuitive, obvious format in Toml. For the record, that standard mentioned above is far from obvious to me.

I'm not sure number, string tuples will offer much benefit beyond strings, if the application has to decide how those strings transform the number anyway - parsing numbers is easy, handling units is a fiddle.

workingjubilee commented 4 years ago

The USA standard for characters to use in lieu of μ is mc, as in 150mcg = "150 micrograms" which, while not intuitive, is in fact standard. In fact, the precise reason that the USA standard is that is because people might confuse the Mu symbol with a u, and be mislead as to its meaning. Accepting SI and USA transparency would be fine, but us would be undesirable.

Of course, ISO8601 favors the .000001 second style of notation as of the previous issuance... it was recently revised in 2019 and I am not sure how, exactly, yet.

yeongjet commented 4 years ago

Need this feature!

Felk commented 4 years ago

Days and weeks in those [ISO8601] standards are defined only relative to the date that you subtract them from. This means P1D is not always 24 hours, so you cannot use the built-in language duration types.

Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug. Though I don't have any numbers on how common actual usecases of using calendar units for time are, work experience shows that whenever people used instant.add(1, ChronoUnit.DAYS) instead of instant.atZone(...).addDays(1) in java, it always caused a timezone bug.

Given that ISO8601 makes a clear distinction between periods and time in the form of P<period>T<time>, I am strongly in favor of using it. Also since the already built-in support for datetimes RFC 3339 is also ISO8601-compatible, it seems intuitive to stick with it

eksortso commented 4 years ago

The more I look at ISO 8601's duration standards, the more I like them. The P prefix identifies durations immediately, and T instantly separates the times and dates. And the smallest unit can have a fractional value. They don't have the brevity that @JelteF's original proposal offered. But maybe for the sake of easier configuration writing, we can accommodate a few modest extensions?

Here are some proposals. I tried to cover all the bases touched upon so far, and I hope I didn't stretch things out too far. What do you think?

ChristianSi commented 4 years ago

@eksortso In my view, if we follow the ISO 8601 standard, we should stay close to it. One or two small changes may be fine, but if we're to deviate as far as you suggest, we can as well start from scratch and roll our own solution – maybe as @JelteF suggested or something close it. Or we look for another fine standard/convention that fits our needs better without requiring as many changes as you suggest.

eksortso commented 4 years ago

Fair enough. No need for additional units. Not now, anyway.

But, and I was trying to address some of @JelteF's concerns: I still recommend allowing the underscores, being case-insensitive, and prepending P if T starts a time duration. TOML would gain readability and brevity, and these somewhat intermediate forms can be converted to ISO 8601-compliant durations with trivial string munging.

JelteF commented 4 years ago

Strictly separating time and calendar periods tends to be a good thing. Using a calendar unit ("day" and everything above) in the context of time is almost always a bug.

@Felk Thinking about this again, you are absolutely right. However, what I'm trying to say though is that this separation is not very useful in practice, if it means you cannot convert the string into the built in duration type of the language. This is the case for Go and Python at least (and I expect more languages).

JelteF commented 4 years ago

@eksortso I agree that if we add a pass that removes all underscores and capitalizes all letters, we can still use parsing libraries for the standard.

eksortso commented 4 years ago

Thanks, @JelteF.

The notion to allow T instead of PT for intervals with no date components wasn't done to create a distinction between date-based and time-based durations. It's just that I would confuse myself sometimes, that P5M is five months and PT5M is five minutes. That "T" makes a difference, but I guess I still need to convince folks that letting it stand without the "P" would be valuable.

eksortso commented 4 years ago

So I know we're trying to get v1.0 out the door, but since there's little along those lines that I can help with, I'd like to move this along, in anticipation of v1.1.

I've sat on a PR for this for awhile. Would it cause any problems (e.g. distract from v1.0 release efforts) if it were submitted for future consideration?

salmangano commented 4 years ago

I needed a way to represent durations sooner than later. I also did not want to fork the implementations I use as I rely on both cpptoml and python toml. So for the time being I am using an inline table like so:

delta={count=-15, unit="secs"}

I have a simple C++ utility to convert this into the std::chrono::duration types.

But I'd love to see first class support for this.

abelbraaksma commented 4 years ago

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

Btw, 8601 allows weeks, and an abbreviated format (without the letters). I wouldn't use either of those either (but I think there was already consensus on that in the main thread).

even though ISO 8601 doesn't appear to allow them

They don't disallow them, which in standard's parlor usually means that they allow them. My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive. Subtracting parts is complicated, and without timezone information not even reliably possible. What's more, you'll get a different amount of days depending on the time of year (daylight saving time) if you allow subtractions of parts.

Ending up with a duration that's either years and months, or days and time means you have ordered types. These types are exact. Once you mix these, they mean something else depending on time of year.

That's OK, and ultimately up to implementers, but doing all that for positive or negative durations is already quite some work. If independent parts can be positive and negative it's that much harder. And likewise, that much harder to explain to end users and in spec prose.

abelbraaksma commented 4 years ago

Note that my point of limiting scope of individual members is not about date, time, duration calculations in toml, but that it can be reasonably expected to be the main use case where these types will be applied.

(though I can sympathize with an opposing argument that we should be inclusive and allow each duration segment to be negative, many existing implementations of such types don't support such flexibility, but also, those that do either chose to support that the whole duration can be negative, or support that individual segments can be negative, but not both)

eksortso commented 4 years ago

Copied from #717:

But I'd only allow the last expressed unit to have a fractional part, per ISO 8601.

While 8601 allows it on the last, but any part, I propose to only allow it on seconds. This is also the approach that the standards body of W3.org adopted.

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway. But not everything does. So I have no problems with using ISO 8601's approach to fractional units, which seems reasonable enough to me.

It's non-trivial what it means to have fractional minutes, hours or even days, months or years. You're better of keeping it simple, and users will quickly come to understand that only the seconds part can be decimal (or double).

I can understand the value of simplicity, but I also want to create a standard that's eminently usable. I wouldn't exclude half-hours for general use when over 8750 hours each year would interpret 0.5 hours the exact same way.

Allowing such niceties creates challenges to devise simple, precise definitions. This is partly done; I already have ABNF code that takes fractions into account. And once I submit a PR (with language that's not on the computer I'm currently typing on), you can assess that for yourself.

My suggestion would be, again, to keep it simple: either the whole duration is negative, or the whole duration is positive.

I agree with you here. Plus or minus the whole duration. That way, we can safely look past the fine points of duration arithmetic.

abelbraaksma commented 4 years ago

The significance of W3.org standards only carries so far. Web technologies operate in second-based time intervals anyway.

Perhaps true for http (but that doesn't support durations, iirc), my work was in the xml, xsd, XPath, xslt area, and those transcend the area of just "web technologies".

(and also, the W3 mention was merely as an illustrating example how "some other standards body" did it, I'm fully aware that their approach has been, and often still is, with its own flaws)

But i understand your points. Besides, most discussion in the W3 groups was wrt date, time, tz, era, calendar and duration arithmetic, which can fill a bookshelf by itself ;). It's dauntingly complex...

I now realize that data manipulation is not something toml concerns itself with. I understand you need to be able to support applications that would want to express fractional time units, while other applications might want to prohibit that. Which is kind of in the same league of an application expecting a numeric value that is in the range 1-10, while toml will allow any 64 bit integer.

In other words, @eksortso, I see now why you'd generally prefer a broader definition over a more limiting one, allowing a wider range of potential scenarios.

abelbraaksma commented 4 years ago

Btw, would we want to differentiate between time span and duration? The first is defined by a start- and end-datetime, the second by a period without reference to, or bearing on, a given datetime. They are semantically equivalent, but serve different scenarios, and are expressed and interpreted differently. (apologies if this has already been decided).

eksortso commented 4 years ago

@abelbraaksma Since time spans really haven't been discussed yet, we can keep them separate. I've not delved into them, though I'm aware that ISO 8601 does have a standard. Not sure how much it's used. But I could say the same about their durations, and note how they differ from existing data type declarations.

abelbraaksma commented 4 years ago

Not sure how much it's used.

Not much, as far as I can tell, though I can see it's potential. I think the complexity (a duration bound to a date requires date time calculations) is what explains the lack of it. It seems that most systems that need to support date, time and duration/timespan choose one over the other with varying degrees of support for calculations. Some systems, notably .NET, use a time interval, with no option to express years or months (more restrictive than an ISO 8601 duration, but parsers targeting .NET can use NodaTime, which recently improved duration support).

It's probably best to stick to one option first anyway. I'm curious how easy or hard it will prove to be for implementers, esp since each framework uses a different definition.

septatrix commented 2 years ago

I feel like using such a loose opens many potential problems which TOML has managed to avoid for now. Many languages have no builtin duration-type and in such cases the behaviour may become very unexpected. Would 1h be represented as a number for the amount of hours, minutes, seconds or even millis? Or maybe even a string instead to prevent this confusion? Well what about readers expecting 10h to be hexadecimal. Or what about 1h70m? I strongly dislike YAML because it has similar problems where 15:30 is not a string but suddenly an integer. However 15:70 is a string. Or use more than 4 colons and it is also a string. TOML already managed to avoid this as good as possible for dates by enforcing a rather strict syntax. So basing this syntax on ISO 8601 as well would be advantageous. It would also reduce unintended clashes with data which only represents a duration by accident.

Apart from that I am not sure if TOML really needs duration support. In most cases the expected values lie within one or two order of magnitudes. So you could simply call the attribute TIMEOUT_DURATION_SECONDS or similar. The author already acknowledges this albeit raising some concerns. However should some projects really need duration support they can easily implement it on top of string values. This would even allow them to use libraries which would understand 1 week or similar more "human" formats.

pradyunsg commented 2 years ago

Hi folks! It's taken me a while to come around to this, and I certainly appreciate the discussion and patience here.

I'm not particularly convinced that any of 1h 30m, 1 hour 30 minutes, PT2H30M (pick your whitespace) are particularly different experiences to type inside or outside of a string. This is different from dates and datetimes, where you can (and often do) have long multi-part sequences. Plus, date/datetimes/times are a lot more common than duration.

The operative questions I have here are:

abelbraaksma commented 2 years ago

How is this an improvement over letting applications parse this information out of a string, table or integers?

@pradyunsg If you don't standardize it, each TOML file will likely have a different way of defining a duration. And each implementer can either choose to support its own extension to TOML, or leave it out. In the latter case, this means that a programmer is required to create an implementation for the parsing.

What's the benefit of moving that complexity to all the parsers of TOML, instead of the individual applications that need this functionality?

If you don't do it, every person that needs a duration needs to implement a parser by themselves. It's not hard, but certainly not trivial either (don't forget that only few programmers in the world even understand how to write parsers). If it's out of the box available, each implementation only needs to do it once, as opposed to each user.

Of course, implementing it does not come free. Many frameworks will have libraries that already support standard durations, but also many that don't. Writing a parser for the PT2H30M format is trivial, it can even be done with a regex, which I'll gladly share (the 1 hour, 20 minute format is much harder to do, too many exceptional cases). An internal implementation of a Duration type is typically just a struct for ms and months.

In the end it's a cost/benefit analysis. I believe in uniformity and standardization as ultimately that makes the world simpler. But whether or not you'd implement a given feature (in this case, durations), is whether or not you consider it beneficial enough.

Here's an example of the "pain" a user has to go through when wanting durations in the Go TOML parser. That's not trivial, even though a parsing function already exists in Go: https://github.com/BurntSushi/toml#using-the-marshaler-and-encodingtextunmarshaler-interfaces.

arp242 commented 2 years ago

The thing is that there are many types that could be useful: regexps, IP address, GPS coördinates, octal file permissions, distance, weights, angles, complex numbers, etc. At some point you have to say "this is too specific to implement in the TOML specification". At which point does something get "too specific"? I don't know, but I think that's the most important question here, not whether or not it's work to implement by people using TOML for their applications.

I'd judge it as a little too specific to implement directly in the specification, but I don't really have any numbers on that or anything. As Pradyun mentioned, if it's added to the specification then every TOML implementation will have to write their own duration parser, which is work too (you probably won't be able to use some stdlib thing like Go's time.ParseDuration for most implementations, as the format will probably differ, and requires some work in the lexer too if it's outside of strings).

I'm not super-opposed to implementing duration support. I mostly lean towards "no" but it's a small lean and don't care strongly; I mostly wanted to comment on what I think are the more important questions that need to be answered.

I think a good thing to do to move this discussion forward would be to check how many TOML applications/users actually use some form of duration type. Right now I have no idea if that's 1%, 50%, or 90%. If it's 90% then I think it would be a no-brainer to add it as there's clearly a common demand. If it's 1%: probably not. I'm not sure how to best go about this: maybe get some statistics on most used libraries using the common TOML libs for JS/Python/Ruby/Go, and then check the top 500 for "duration" or whatever makes sense in that language? It's all some work, but having at least a rough indication would really clarify how useful this is in general.

marzer commented 2 years ago

If you don't do it, every person that needs a duration needs to implement a parser by themselves

As much as I don't want to have to add this to my implementation, I concur with this. Standardisation is good in the face of a vacuum, when something is suitably common, and durations are surprisingly common (tick delays, sleep durations...)

pradyunsg commented 2 years ago

I've spent the last few minutes trying to find an example of someone setting durations in a TOML file and failed. Can folks provide real world examples and use cases for configurations where this would be useful?

abelbraaksma commented 2 years ago

@pradyunsg, seeing use-cases is a good thing, but if something isn't available yet in a language, it'll be hard to find it, as it doesn't exist yet. At best we can find people working around it in some way, but we have no idea how they call it. And then we don't know what to offset our sample against: if we find 100 examples, how do we know if that's 10% or 1%? The internet doesn't easily quantify itself as discrete sets...

But I tried anyway and found some bits ;).

Looking at this differently, why not offset Duration/Timespan against datetime in a known language? So I did that and got this (in C#, DateTime and TimeSpan are classes, which makes it likely I hit code and not just text, as they are not common words):

I think the last item in this list is the most useful. Cursory reading of some of those files shows that duration is configured in a variety of ways, which to me certainly raises the question: should we normalize this to a standardized way?

It's not surprising that there's much less "duration" then there's "date" and "time". I'm actually quite surprised how many hits I found. I briefly checked JSON code as well (found more durations than datetime!), but since that's also used as serialization of arbitrary data it's not a good comparison, I think.

abelbraaksma commented 2 years ago

It's also mildly interesting that the amount of occurrences of "duration" vs occurrences of "date" in TOML files is roughly 1:5 and that that's very close to the relation of similar keywords on StackOverflow.

A very premature conclusion could be that, even with this feature not there, 20% of all TOML users already uses durations and could definitely benefit from a duration type (assuming roughly 100% uses a date or time type). (I realize I'm not being very scientific here!)

arp242 commented 2 years ago

I took a closer look at ISO 8601, and I don't like it at all; I think it's unobvious, hard to read and write, and error-prone (P1M is 1 month, PT1M is 1 minute), and no native support for milliseconds, nanoseconds, etc. is a pain.

VCL has durations similar to the original proposal, and I find it works much better:

VCL ISO8601
10 hours 10h P10H
hour and a half 1.5h P1H30M
10 milliseconds 10ms P0.01S
1 minute 1m PT1M

Things like 1h30m don't need to be supported, IMHO; just 1.5h is good enough.

I don't think we gain much by using ISO 8601; it's not a commonly encountered or implemented standard unlike RFC 3339 dates TOML uses. Also the standard isn't publicly available; you need to pay ~€150 for it. That part is solved by using RFC 5545 since it defines a subset (excludes M for months and Y for year), but it still retains most of the above issues.


If this is added then personally I'd be in favour of using s[econds] as suffix, where all parts of econds is optional, and allowing a space between the number and unit (e.g 1s, 1sec, 1 second, 1 seconds are all identical).

List of suffixes:

m[illi]s[econds]   milliseconds
s[econds]          seconds
m[inutes]          minutes
h[ours]            hours
d[ays]             days
w[eeks]            weeks
y[ears]            years

"month" is intentionally excluded as it's so variable (28, 30, or 31 days?) "year" can be defined as always having 365 days, ignoring leap years, which is "good enough" for this purpose. Could maybe also add microseconds and nanoseconds; but not so convinced it's needed; I find that using time units smaller than 1ms is rare for these kind of things. It can always be added later if there is actually a demand.

This way it's easy to read, easy to write, pretty obvious, and easy to implement.

marzer commented 2 years ago

@arp242

I took a closer look at ISO 8601, and I don't like it at all; I think it's unobvious, hard to read and write, and error-prone (P1M is 1 month, PT1M is 1 minute), and no native support for milliseconds, nanoseconds, etc. is a pain.

I came to the same conclusion recently. I have a half-written rant about it that I was going to contribute to this discussion, but ultimately decided to shelve it until I chilled out a bit, then forgot 😅 . The gist of it was basically "ugh, I hate this". Far too cumbersome and unintuitive. I'm in favour of any version of this proposal that allows me to write something like 4h50m30s, et cetera. Your version, or the one in the OP, whatever, so long as it's not the absolutely woeful ISO 8601.

eksortso commented 2 years ago

@arp242 There's a good reason why I was pushing for friendlier units and whitespace/underscores between numbers and units. The duration syntax given by ISO 8601 is not human-friendly. But it contained some compelling ideas, and the promise of duration parsers that already existed made this approach tempting.

I like your list of suffixes. You could also look at #717 and see what that PR suggests. In fact, the whole conversation that took place there so far is worth reading.

But the approach that @salmangano took has made me realize that we need to start small. Let's focus on units smaller than a day. Let's consider only allowing durations to have a single one of these units. Let's find out if negative durations are desirable. Then we can go from there. Chaining, as noted by @marzer, would be useful, and I'd like to see it too, but let's coalesce around single time duration units first.

abelbraaksma commented 2 years ago

I personally have no problems at all with the ISO syntax, but I might be biased: I've seen it often enough (and have implemented it myself a few times) that it comes natural to me. In the end, what people consider as most human-friendly and still easily machine-definable is totally fine with me. It's good that you (@marzer, @arp242) took a deeper dive into this and came to a strong favor of the xxm/xxh syntax. Though I do have a few comments, concerns:

Things like 1h30m don't need to be supported, IMHO; just 1.5h is good enough.

I think you should definitely support it, as time cannot be precisely defined in decimals. You cannot express 1 hour and 10 min as 1.1666666666666h, as it's never a precise mapping, while 1h10m is.

You may even consider, just as with times, to only allow fractions in the seconds part, as it may prove hard to define a proper translation from inexact decimal (which may be represented as floats internally as some languages don't support fixed decimals) to exact duration.

Assume for a moment we have to deal with floats. Now, language X may support 80 bit float on one CPU but 64 bit float on another. This may lead to, after reading a TOML file, that two equally written durations, are not the same for comparison.

Anyway, that's a bit of a tangent, but bottom line is, from decimal or floating point to sexagesimal (which is what time is) can be harder than it seems at first and when calculations come into place, may lead to unexpected, incomparable results.

My suggestion would be then to use \d+h\d+m[0-9.]+s (not a precise definition, but you get the idea).

abelbraaksma commented 2 years ago

While we're at it, how are we going to interpret mixed hour/days/months/year durations? In most specifications and implementations I've seen, it's either time+days or months+years, but not both (and if both are allowed, they are not allowed to be normalized).

Consider 1m2d, and 33d. These cannot be combined or be considered the same. There's more to this, but I'll leave it at this simple example in case this can of worms has been opened before and been sorted out ;).

Edit, just missed this suggestion:

Let's focus on units smaller than a day. Let's consider only allowing durations to have a single one of these units.

Probably a good idea indeed to start small. Though I don't think we should disallow durations longer than 24h, just disallow durations with d, m and y in it (though frankly, d isn't the trouble maker, it is m and y).

We should also be explicit in allowing 1h5m to be the same as 65m, for instance.

arp242 commented 2 years ago

You cannot express 1 hour and 10 min as 1.1666666666666h, as it's never a precise mapping, while 1h10m is.

You can use 70minutes; I think that's "good enough", and it's a good trade-off with keeping both the implementation and syntax simpler. Some small possible inaccuracy with floats is also fine I think; we're not concerned with precision time-keeping, and if guaranteed precision is really needed you can use milliseconds or nanoseconds similar to 70minutes instead of 1.166..hours

Consider 1m2d, and 33d. These cannot be combined or be considered the same. There's more to this, but I'll leave it at this simple example in case this can of worms has been opened before and been sorted out ;).

I think we shouldn't include "month" at all, or if we do, simply define it as "30 days".

There is no good way to deal with this in a context-less "duration" unless you force implementations to parse it as an object which keeps track of this (e.g. instead of merely storing it as an int64 you need some class/struct with a hour, day, month, etc. field), but many stdlib "duration" types don't (at least, Python and Go doesn't).

abelbraaksma commented 2 years ago

There is no good way to deal with this in a context-less "duration"

Well, there is (keep month and year separate, basically), and there isn't (some might not call this a "good way"). But I agree, as I mentioned in my other comment, which I may have just edited while you were typing.

and it's a good trade-off with keeping both the implementation and syntax simpler.

You may have misunderstood why I made the suggestion. It is precisely to keep the implementation simpler, as there's no way of knowing what happens if we try to force a decimal time-duration system on people. It's just not what time is. I think it'll be much harder to formalize decimal minutes (which you'll need to do if you were to allow it) than it is to allow only integer hours, minutes and decimal seconds.

I don't think it'll be hard to create, parse and interpret [Xh][Xm][Fs] where Xdenotes an integer and F a float/decimal. Each section is optional, of course, and the order h-m-s is required. Overflow travels from right-to-left where it concerns comparisons.

Falkon1313 commented 2 years ago

I think we shouldn't include "month" at all, or if we do, simply define it as "30 days".

Just want to point out that excluding months (or days or years) could be a WTF for users if you have other units. Arbitrarily assigning a non-standard value for them (like 30 days per month) would be even worse, since it would falsely appear to be able to do the right thing, but actually only sometimes would and other times you'd have apparently random bugs.

Lots of things in both business and tech operate in terms of months. Whether it be things like quarterly reports (3 months) or monthly billing (1st of every month) or checking when something is due or if something is more than a month overdue, etc. You might have monthly log rotations, quarterly batch processes, semi-annual things (6 months), etc. I don't know how often people would reach for a duration (aside from the next due/overdue case, which is actually very common), but if it's there they'd expect to be able to use it.

If this type is specifically going to exclude things like that or handle them in non-standard ways, then it needs to at least be very clearly documented that people who need standard durations should not use it but instead use a string and their standard libraries to handle it. And that it's not meant for things like scheduling, etc. That it's really only meant to measure durations in contiguous real seconds regardless of timezones and DST? In which case you only really need the seconds unit, right? Well, maybe microseconds too.

Because I'd also second abelbraaksma in saying that decimal durations would be a bad idea.

Which brings me to a suggestion.

If it's not considering DST or month durations etc., then anything above 1 hour is ambiguous. If not accounting for leap seconds, then even 1 minute is ambiguous. So if the intent is to specify a duration in raw seconds, or less, then those are the only units that should be available. Whether it is seconds, milliseconds, nanoseconds, whatever unit precision makes the most sense; as an integer. And documentation should make clear that it's raw time, not clock time or calendar time, so people don't use 86400s to intend a day, etc. I'd suggest calling it something like 'raw duration' instead of just 'duration' to make it clear.

I think that would simplify and clarify it. Maybe something like /RD(\d+s)|(\d+ms)|(\d+µs)|(\d+ns)/

arp242 commented 2 years ago

Re: @abelbraaksma; it's indeed not hard to parse 1h10m, it just seems to me that the alternative is simpler.

At any rate, I just looked at what seems to work well for Varnish; that was the only config file format I could think of with native duration types (Might be worth looking what other formats are out there, can't recall any from the top of my head).

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO, as I find that mixing both decimal fractions and base-24/base-60 to be confusing (e.g. 1.5m and 1m30s would be the same), with the possible exception of seconds.


Re: @Falkon1313: I'm not sure how common those scenarios really are for TOML; what a duration useful for is mostly things like timeouts, cache durations, how often to run some background jobs, things like that.

Things like "send report every quarter" or "send invoice 1st of every month" can't easily be expressed in a time duration; the first issue is that many standard libraries use an integer or some variant thereof so the only way this can work is if TOML implementations provide a custom "duration" type which keeps records what the TOML file actually has, and which won't integrate all that well in most stdlibs. Personally, I'd really like to avoid that: TOML should be easily parsed to the native types of most common languages.

The second issue is what does "3 months" really mean? 3 months from when the application starts? 3 months from now? 3 months from Jan 1st? For something like cache-duration = 1week or connect-timeout = 10s this doesn't matter, but for "send reports every 3 months" or "send invoice every month" it does. Personally I'd never encode this kind of thing as a duration, but rather as send-invoice-day = [1..31] and send-report = "[daily/weekly/monthly/quarterly]".

A small ambiguation also exists due to leap seconds and leap days, but for many (not all) use cases these can essentially be ignored.

septatrix commented 2 years ago

Usually the relevant time scale is pretty well known and does not differ by more than an order of magnitude which is why I do not thing this feature is too critical. In most cases a well chosen field name is sufficient like TimeoutSec = 5 like systemd does... There are situations where the duration might span a wider range e.g. BackupInterval could be anything from 12 hours to every 4 weeks but at least in this case it is probably better to use a crontab like syntax anyhow. Sure it would be nifty to have this at hand sometimes but most situations could be solved with a well chosen field name.


Generally this seems like a very niche feature while resulting in more complex implementations. A good chunk of languages have no native duration support making supporting this even more annoying. Short letter abbreviations can result in unexpected type conversions which is already very annoying in YAML and I hope TOML will avoid this. Many interpretation issues like whether a year is 365 or 365.25 days long and restricting the allowed units only makes this feature even more niche.

abelbraaksma commented 2 years ago

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO,

@arp242 I agree, that's why I suggested to use integers for h and m and floats for s.

With respect to the side-discussion on allowing months and years, if (big if?) we go that route, just do what NodaTime and other libraries do and don't mix year-month durations with day-time durations. Durations are irrespective of a timezone or a starting date/time. Hence a minute is 60 seconds, an hour is 60 minutes. But a month has undefined length (it must be irrespective of starting date/time), so a year is 12 months, but what a month is, we don't define.

If you have any date or date-time value, you can add a year-month duration to it and a day-time duration. You can also add a year-month-day-time duration, but only by adding year and month first and then adding day and time.

That way it is an unambiguous definition.

eksortso commented 2 years ago

I'm not opposed to the 1h10m format. If we do go with that then fractions should be forbidden IMHO,

@arp242 I agree, that's why I suggested to use integers for h and m and floats for s.

@abelbraaksma I'd recommend using the same precision that we define for time types.

From v1.0.0:

Millisecond precision is required. Further precision of fractional seconds is implementation-specific. If the value contains greater precision than the implementation can support, the additional precision must be truncated, not rounded.

abelbraaksma commented 2 years ago

@eksorto, you're absolutely right, my main point was to have integers for hours and mins, secs should be the same as for time of course.

eksortso commented 2 years ago

@abelbraaksma Well, taking a hint from the current spec, we could use a similar approach for hours, minutes, and seconds. Values falling within well-defined boundaries will be accepted as is. And if the time values fall out of bounds, are fractional float values, or are specified out of order, then the parsing behavior would be implementation-specific. Someone will want to use 0.5h instead of 30m. It's inevitable. But if they do that, then they must know it won't be standard behavior. It will be defined by the parser and the language, not by us.

But, any potential reliance on implementation-specific behaviors does beg the question posed by @pradyunsg of whether durations ought to be standardized in TOML at all. Do we want to bear the burden of defining time delta standards that all parsers must adhere to? We got away with that for dates and times. But we'd have to impose TOML-specific duration standards that are not as clear-cut as what exists for datetimes.

abelbraaksma commented 2 years ago

@eksortso, that might be a viable approach. Also, I totally understand the reluctance of implementing this in the first place. I don't really have a strong opinion on that. I do like strong, useful types in TOML, but at the same time, where do you draw the line? Whether this is feature-creep or not is probably anybody's guess. Yet at the same time, it's useful and a relatively small addition. And people are not required to use it (heck, I know many people using TOML without using tables...).