pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.31k stars 1.96k forks source link

Support ISO8601 Duration Format #12615

Open david-waterworth opened 11 months ago

david-waterworth commented 11 months ago

Description

I'm increasingly encountering APIs that return data using ISO8601 durations (https://en.wikipedia.org/wiki/ISO_8601#Durations).

i.e. 15 Minutes is PT15M

In particular, all our internal graphql endpoints use this representation when requesting raw data.

It's probably too late to change, and undesirable to maintain two parsers but I figured I'd put the suggestion out there anyway.

MarcoGorelli commented 11 months ago

Sounds like a good suggestion, using a standardised string language is probably better for the ecosystem than using a Polars-only string language

Most of it would be quite low-effort to support (iso8601 durations are very similar to the Polars string duration language), but the major difference I see is that iso8601 durations allow for decimals, like 'P1.3D'. Isn't that a bit ambiguous? I'm not sure what exactly 1.3D means - how many hours is that, especially if it's on a DST transition?

The following, however, should be feasible and unambiguous. ISO8601 durations, but:

Examples:

Does this seem reasonable, and would it solve your use case?

david-waterworth commented 11 months ago

I wasn’t aware that they allowed decimals either - it doesn’t seem necessary/useful (or unambiguous) in my opinion.

Also interesting that the Wikipedia article considered PT36H and P1DT12H to behave differently wrt daylight savings. Perhaps the standard covers this in more detail, might be worth further investigation?

but in general what you propose looks fine to me.

MarcoGorelli commented 11 months ago

interesting that the Wikipedia article considered PT36H and P1DT12H to behave differently wrt daylight savings

so does Polars, so there'd be nothing to change here πŸ˜‰

In [14]: df.with_columns(
    ...:     b=pl.col.a.dt.offset_by('36h'),
    ...:     c=pl.col.a.dt.offset_by('1d12h'),
    ...: )
Out[14]:
shape: (1, 3)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ a                        ┆ b                        ┆ c                        β”‚
β”‚ ---                      ┆ ---                      ┆ ---                      β”‚
β”‚ datetime[ΞΌs,             ┆ datetime[ΞΌs,             ┆ datetime[ΞΌs,             β”‚
β”‚ Europe/London]           ┆ Europe/London]           ┆ Europe/London]           β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ══════════════════════════β•ͺ══════════════════════════║
β”‚ 2020-10-25 00:00:00 BST  ┆ 2020-10-26 11:00:00 GMT  ┆ 2020-10-26 12:00:00 GMT  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
MarcoGorelli commented 11 months ago

Marking as 'accepted', meaning "OK to support ISO8601 durations as well, alongside the Polars duration string language"

Relatively low-priority, but will get to it when I get a chance