nushell / nushell

A new type of shell
https://www.nushell.sh/
MIT License
32.54k stars 1.68k forks source link

dataframes error: Value not supported in nushell: duration[ms] #8184

Open bobhy opened 1 year ago

bobhy commented 1 year ago

Describe the bug

Can't calculate duration between 2 datetime columns in a dataframe. Also cannot calculate other_date from datetime and duration columns.

How to reproduce

Problem trying to calculate a duration:

〉((date now) | dfr into-df) - (((date now) - 10sec) | dfr into-df)
Error: 
  × Error creating Dataframe
  help: Value not supported in nushell: duration[ms]

〉((date now) | dfr into-df) - (((date now) - 10sec) | dfr into-df) | dfr dtypes
╭───┬─────────┬──────────────╮
│ # │ column  │    dtype     │
├───┼─────────┼──────────────┤
│ 0 │ sub_0_0 │ duration[ms] │
╰───┴─────────┴──────────────╯

〉(date now | dfr into-df) | dfr dtypes
╭───┬────────┬──────────────╮
│ # │ column │    dtype     │
├───┼────────┼──────────────┤
│ 0 │ 0      │ datetime[ms] │
╰───┴────────┴──────────────╯

〉(10sec | dfr into-df) | dfr dtypes
╭───┬────────┬───────╮
│ # │ column │ dtype │
├───┼────────┼───────┤
│ 0 │ 0      │ i64   │
╰───┴────────┴───────╯

Here, it looks like dataframe wants to truncate NuShell datetime values (nanosec resolution) to a datatype with millisecond resolution, and likewise (wants to) truncate a NuShell duration to milliseconds.

But in 10sec | dfr into-df, something breaks down across the pipe operator and the column gets vanilla i64 (Python-speak for usize?). I guess it should be duration[ms], per the error message

OK, so what is the duration actually stored in the dataframe?

〉(date now) | into int
1677191765

〉(date now) - 10sec | into int
1677191768

〉(10sec | dfr into-df  | dfr first).0
10000000000

The raw value for duration appears to be nanoseconds while the raw datetime is seconds (since linux epoch)

〉(date now) | into int
1677193193

〉((date now) | into int | dfr into-df) | dfr dtypes
╭───┬────────┬───────╮
│ # │ column │ dtype │
├───┼────────┼───────┤
│ 0 │ 0      │ i64   │
╰───┴────────┴───────╯

〉(((date now) | into int | dfr into-df)| dfr first).0
1677193562

〉((date now) | into int | dfr into-df) - (((date now) - 10sec) | into int | dfr into-df)
╭───┬─────────╮
│ # │ sub_0_0 │
├───┼─────────┤
│ 0 │      10 │
╰───┴─────────╯

So if I calculate a duration as difference between raw timestamps, I get a number of seconds, which makes sense.

But that's not consistent with whatever is going on when a Nushell duration is stored in a df column directly -- it seems the conversion botch is just on the duration type.

Some similar error happens when calculating an offset date from a date and a duration, though the error is different, and dependent on the order of operations as well.

〉(date now | dfr into-df) + (10sec | dfr into-df)
Error: nu::shell::incompatible_parameters (link)

  × Incompatible parameters.
   ╭─[entry #238:1:1]
 1 │ (date now | dfr into-df) + (10sec | dfr into-df)
   ·             ─────┬─────             ─────┬─────
   ·                  │                       ╰── datatype datetime[ms]
   ·                  ╰── datatype datetime[ms]
   ╰────

〉(10sec | dfr into-df)
╭───┬─────────────╮
│ # │      0      │
├───┼─────────────┤
│ 0 │ 10000000000 │
╰───┴─────────────╯

〉(10sec | dfr into-df) + (date now | dfr into-df)
Error: nu::shell::incompatible_parameters (link)

  × Incompatible parameters.
   ╭─[entry #240:1:1]
 1 │ (10sec | dfr into-df) + (date now | dfr into-df)
   ·          ─────┬─────                ─────┬─────
   ·               │                          ╰── datatype i64
   ·               ╰── datatype i64
   ╰────

Expected behavior

I'd like to be able to do both duration and offset date calculations between columns in a dataframe using minus and plus operations.
At a minimum, dataframes should support 32 bit timestamps and durations (1 second resolution).
Ideally, dataframes would do 64bit NuShell-style datetime and duration (nanosec resolution bits)

Screenshots

No response

Configuration

key value
version 0.76.1
branch main
commit_hash aba0fb0000d1a92585c76bbd2f3b387686c32201
build_os linux-x86_64
build_target x86_64-unknown-linux-gnu
rust_version rustc 1.66.1 (90743e729 2023-01-10)
rust_channel 1.66.1-x86_64-unknown-linux-gnu
cargo_version cargo 1.66.1 (ad779e08b 2023-01-10)
pkg_version 0.76.1
build_time 2023-02-22 23:07:04 -05:00
build_rust_channel release
features database, dataframe, default, trash, which, zip
installed_plugins

Additional context

No response

bobhy commented 1 year ago

Problem still repros in 0.79. Here's a slightly different repro, where the question is what dataframe is being created? Error says 'error creating dataframe'.

〉let d1 = (((date now | dfr into-df)) | inspect)
╭─────────────┬───────────╮
│ description │ dataframe │
├─────────────┴───────────┤
│                         │
├─────────────────────────┤
│ dataframe               │
╰─────────────────────────╯

|v| ----------------------------------------------------/home/bobhy ----------------------------------------------------
〉let d2 = (((date now) - 10sec) | dfr into-df | inspect)
╭─────────────┬───────────╮
│ description │ dataframe │
├─────────────┴───────────┤
│                         │
├─────────────────────────┤
│ dataframe               │
╰─────────────────────────╯

|v| ----------------------------------------------------/home/bobhy ----------------------------------------------------
〉$d1 - $d2
Error: 
  × Error creating Dataframe
  help: Value not supported in nushell: duration[ms]
bobhy commented 1 year ago
reconfirmed with same repro in: key value
version 0.79.1
branch test
commit_hash 738e1211332a9cb4fe83f6a6419906444eb08803