Closed fdncred closed 1 year ago
As it is now there is only a dt
function to convert date strings to datetime
:
dply -c 'json("./buildtimesnd.json") |
mutate(date=dt(date))|
group_by(commit) |
summarize(min_date = min(date), max_date = max(date)) |
mutate(date_diffs=max_date-min_date)|
head(3)'
shape: (3, 4) elapsed: 0.005s
┌──────────────────────────────────────────┬─────────────────────────┬─────────────────────────┬──────────────────┐
│ commit ┆ min_date ┆ max_date ┆ date_diffs │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ datetime[ms] ┆ datetime[ms] ┆ interval[dt] │
╞══════════════════════════════════════════╪═════════════════════════╪═════════════════════════╪══════════════════╡
│ 406b606398bf18c98063fbe998a4d27f75067eef ┆ 2023-07-05T12:41:06.186 ┆ 2023-07-07T13:31:08.454 ┆ 2D 0h 50m 2.268s │
│ 8e38596bc9494357f01f166076e8d563f28016f3 ┆ 2023-07-07T13:35:20.692 ┆ 2023-07-10T15:42:42.746 ┆ 3D 2h 7m 22.054s │
│ 6f5bd62a974c4b8660783b792b18c7597a805c1e ┆ 2023-08-09T17:43:54.095 ┆ 2023-08-09T19:50:25.802 ┆ 2h 6m 31.707s │
└──────────────────────────────────────────┴─────────────────────────┴─────────────────────────┴──────────────────┘
and I have just noticed that it converts them to UTC.
To do that we need to add a function like timedelta('nanos', time)
, looks very useful I'll add that.
That sounds cool, except for the converting to UTC part. I'm particularly interested in the time column. What I set out to do today was to see if I can do what I do in nushell in dply. Here's what my output looks like, from this buildtimes file, in nushell.
❯ open ~\.local\share\nushell\startup-times.nuon | where build == release |
group-by commit |
transpose commit data |
upsert perf {|r| if ($r.data | length) > 3 {$r.data.time | sort | range 0..(-3) | math avg} else {$r.data.time | math avg}} |
upsert start_date {|s| $s.data.date | sort | math min | format date '%Y-%m-%d'} |
upsert end_date {|e| $e.data.date | sort | math max | format date '%Y-%m-%d'} |
upsert num_startups {|n| $n.data | length} |
sort-by start_date
╭#─┬─────────────────commit─────────────────┬─────data──────┬──────perf───────┬start_date┬─end_date─┬num_startups╮
│0 │2bb0c1c618f961843b49432fb7a21304b41493af│[table 2 rows] │120ms 679µs 900ns│2023-07-03│2023-07-03│ 2│
│1 │406b606398bf18c98063fbe998a4d27f75067eef│[table 40 rows]│126ms 134µs 792ns│2023-07-05│2023-07-07│ 40│
│2 │8e38596bc9494357f01f166076e8d563f28016f3│[table 13 rows]│128ms 839µs 700ns│2023-07-07│2023-07-10│ 13│
│3 │cf36f052c46b6efe57500e3acb7f52d2d0cb8d2e│[table 2 rows] │184ms 785µs 800ns│2023-07-12│2023-07-12│ 2│
│4 │b2043135ed956ead0d3b5d5df49ea9d929dc7120│[table 2 rows] │ 136ms 68µs 100ns│2023-07-12│2023-07-13│ 2│
│5 │4804e6a151ca0f212c3f4b097b4d805a69535149│[table 7 rows] │ 130ms 33µs 160ns│2023-07-14│2023-07-14│ 7│
│6 │48271d8c3e1f83723f005ae1809ebd5026783f8a│[table 3 rows] │ 149ms 20µs 600ns│2023-07-17│2023-07-17│ 3│
│7 │a5a79a7d95822bc143090612e1813f3b06befbf4│[table 10 rows]│139ms 845µs 700ns│2023-07-18│2023-07-20│ 10│
│8 │9db0d6bd34a99805c6da296688aa186778be5a86│[table 3 rows] │138ms 853µs 133ns│2023-07-24│2023-07-24│ 3│
│9 │208071916209af5a4159b131e438aa6cab524532│[table 5 rows] │123ms 224µs 733ns│2023-07-25│2023-07-25│ 5│
│10│a33b5fe6ce97b5e9fe8a774c13e783ed65c1b591│[table 7 rows] │125ms 925µs 300ns│2023-07-25│2023-07-27│ 7│
│11│f8d325dbfef5fec7ee109e37c624236998de8843│[table 4 rows] │108ms 199µs 800ns│2023-07-27│2023-07-27│ 4│
│12│6aa30132aae188639a78ba8fd7feddc952d5792e│[table 8 rows] │112ms 912µs 100ns│2023-07-27│2023-07-27│ 8│
│13│8403fff34500d30439545519c88c7d942c717e3e│[table 3 rows] │ 131ms 968µs 33ns│2023-07-27│2023-07-28│ 3│
│14│94bec720791f716b44cb23db363f53e2fa7acce3│[table 21 rows]│118ms 872µs 289ns│2023-07-31│2023-08-01│ 21│
│15│f6033ac5af75073dddce2400304448dbbadd0318│[table 2 rows] │155ms 370µs 400ns│2023-08-01│2023-08-01│ 2│
│16│778a00efa10735e7eb368aea1ddfeb6af3d3720a│[table 4 rows] │126ms 261µs 100ns│2023-08-01│2023-08-02│ 4│
│17│ec4941c8ac45f94ab408753b173ce991ce0fafd3│[table 1 row] │121ms 661µs 700ns│2023-08-02│2023-08-02│ 1│
│18│840b4b854b31e102cafd7667590626c4cfd0e19e│[table 7 rows] │139ms 761µs 300ns│2023-08-03│2023-08-04│ 7│
│19│a9582e1c62f7e7eaec18c0302dbd699b2f164630│[table 3 rows] │176ms 377µs 433ns│2023-08-07│2023-08-07│ 3│
│20│b1974fae399885ba93dda2b2a510ab585bda9f1a│[table 1 row] │118ms 110µs 100ns│2023-08-07│2023-08-07│ 1│
│21│56ed1eb8071658282ffe55b91d6dfc8ef958844c│[table 1 row] │ 152ms 756µs│2023-08-08│2023-08-08│ 1│
│22│85c20350161b100abe14df9ef7b77337e1f76ec0│[table 1 row] │158ms 963µs 300ns│2023-08-09│2023-08-09│ 1│
│23│0674d4960b8b09073b31cca9e62eeeb1a1ddd4a8│[table 5 rows] │119ms 189µs 733ns│2023-08-09│2023-08-09│ 5│
│24│6f5bd62a974c4b8660783b792b18c7597a805c1e│[table 3 rows] │122ms 395µs 366ns│2023-08-09│2023-08-09│ 3│
│25│d5fa7b8a557ca0b0ca272cf82c66c2cc2e940f39│[table 2 rows] │160ms 789µs 100ns│2023-08-10│2023-08-10│ 2│
│26│f9ffd9ae2917871092485f8c5bc393d6e96f7784│[table 7 rows] │125ms 912µs 440ns│2023-08-10│2023-08-11│ 7│
│27│aa37572ddc79f5d2c040b8480ec9af6e5d1550d9│[table 11 rows]│139ms 547µs 844ns│2023-08-11│2023-08-14│ 11│
│28│ad49c17ebacd04585fb4355e079ec87d7fc63d8d│[table 2 rows] │ 134ms 946µs│2023-08-14│2023-08-14│ 2│
╰#─┴─────────────────commit─────────────────┴─────data──────┴──────perf───────┴start_date┴─end_date─┴num_startups╯
I'm getting close but not quite there yet, I'm sure mostly because of my ignorance of dply.
❯ dply -c 'json("buildtimesnd.json") |
❯❯❯ mutate(date=dt(date)) |
❯❯❯ group_by(commit) |
❯❯❯ summarize(
❯❯❯ min_date = min(date),
❯❯❯ max_date = max(date),
❯❯❯ startups = n(),
❯❯❯ perf_datetime = mean(time)
❯❯❯ ) | arrange(min_date, max_date) |
❯❯❯ show()'
shape: (29, 5) elapsed: 0.008s
┌──────────────────────────────────────────┬─────────────────────────┬─────────────────────────┬────────────────┬──────────────────┐
│ commit ┆ min_date ┆ max_date ┆ startups ┆ perf_datetime │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ datetime[ms] ┆ datetime[ms] ┆ i64 ┆ f64 │
╞══════════════════════════════════════════╪═════════════════════════╪═════════════════════════╪════════════════╪══════════════════╡
│ 2bb0c1c618f961843b49432fb7a21304b41493af ┆ 2023-07-03T17:03:08.605 ┆ 2023-07-03T17:45:46.859 ┆ 2 ┆ 120679900.0 │
│ 406b606398bf18c98063fbe998a4d27f75067eef ┆ 2023-07-05T12:41:06.186 ┆ 2023-07-07T13:31:08.454 ┆ 40 ┆ 129605145.0 │
│ 8e38596bc9494357f01f166076e8d563f28016f3 ┆ 2023-07-07T13:35:20.692 ┆ 2023-07-10T15:42:42.746 ┆ 13 ┆ 132144692.307692 │
│ cf36f052c46b6efe57500e3acb7f52d2d0cb8d2e ┆ 2023-07-12T15:10:55.382 ┆ 2023-07-12T19:23:46.223 ┆ 2 ┆ 184785800.0 │
│ b2043135ed956ead0d3b5d5df49ea9d929dc7120 ┆ 2023-07-12T20:56:26.198 ┆ 2023-07-13T20:14:08.484 ┆ 2 ┆ 136068100.0 │
│ 4804e6a151ca0f212c3f4b097b4d805a69535149 ┆ 2023-07-14T16:39:30.111 ┆ 2023-07-14T20:25:04.629 ┆ 7 ┆ 144062385.714286 │
│ 48271d8c3e1f83723f005ae1809ebd5026783f8a ┆ 2023-07-17T13:09:13.074 ┆ 2023-07-17T19:20:06.502 ┆ 3 ┆ 149020600.0 │
│ a5a79a7d95822bc143090612e1813f3b06befbf4 ┆ 2023-07-18T16:44:35.140 ┆ 2023-07-20T16:01:20.985 ┆ 10 ┆ 159522440.0 │
│ 9db0d6bd34a99805c6da296688aa186778be5a86 ┆ 2023-07-24T12:57:55.020 ┆ 2023-07-24T13:15:05.868 ┆ 3 ┆ 138853133.333333 │
│ 208071916209af5a4159b131e438aa6cab524532 ┆ 2023-07-25T12:22:59.358 ┆ 2023-07-25T17:46:46.435 ┆ 5 ┆ 231826620.0 │
│ a33b5fe6ce97b5e9fe8a774c13e783ed65c1b591 ┆ 2023-07-25T20:39:13.457 ┆ 2023-07-27T14:53:35.224 ┆ 7 ┆ 200283471.428571 │
│ f8d325dbfef5fec7ee109e37c624236998de8843 ┆ 2023-07-27T15:05:15.536 ┆ 2023-07-27T15:26:56.575 ┆ 4 ┆ 114143575.0 │
│ 6aa30132aae188639a78ba8fd7feddc952d5792e ┆ 2023-07-27T15:56:29.689 ┆ 2023-07-27T18:06:09.548 ┆ 8 ┆ 119858087.5 │
│ 8403fff34500d30439545519c88c7d942c717e3e ┆ 2023-07-27T20:02:18.214 ┆ 2023-07-28T14:01:34.190 ┆ 3 ┆ 131968033.333333 │
│ 94bec720791f716b44cb23db363f53e2fa7acce3 ┆ 2023-07-31T13:00:49.600 ┆ 2023-08-01T13:41:42.960 ┆ 21 ┆ 123044576.190476 │
│ f6033ac5af75073dddce2400304448dbbadd0318 ┆ 2023-08-01T14:35:39.008 ┆ 2023-08-01T18:00:31.016 ┆ 2 ┆ 155370400.0 │
│ 778a00efa10735e7eb368aea1ddfeb6af3d3720a ┆ 2023-08-01T20:38:56.294 ┆ 2023-08-02T15:11:37.531 ┆ 4 ┆ 152309825.0 │
│ ec4941c8ac45f94ab408753b173ce991ce0fafd3 ┆ 2023-08-02T16:05:31.403 ┆ 2023-08-02T16:05:31.403 ┆ 1 ┆ 121661700.0 │
│ 840b4b854b31e102cafd7667590626c4cfd0e19e ┆ 2023-08-03T12:27:50.091 ┆ 2023-08-04T18:56:57.316 ┆ 7 ┆ 149403514.285714 │
│ a9582e1c62f7e7eaec18c0302dbd699b2f164630 ┆ 2023-08-07T12:24:22.177 ┆ 2023-08-07T17:23:04.703 ┆ 3 ┆ 176377433.333333 │
│ b1974fae399885ba93dda2b2a510ab585bda9f1a ┆ 2023-08-07T17:27:28.423 ┆ 2023-08-07T17:27:28.423 ┆ 1 ┆ 118110100.0 │
│ 56ed1eb8071658282ffe55b91d6dfc8ef958844c ┆ 2023-08-08T12:16:13.356 ┆ 2023-08-08T12:16:13.356 ┆ 1 ┆ 152756000.0 │
│ 85c20350161b100abe14df9ef7b77337e1f76ec0 ┆ 2023-08-09T12:04:46.705 ┆ 2023-08-09T12:04:46.705 ┆ 1 ┆ 158963300.0 │
│ 0674d4960b8b09073b31cca9e62eeeb1a1ddd4a8 ┆ 2023-08-09T13:21:52.946 ┆ 2023-08-09T17:22:13.516 ┆ 5 ┆ 131798320.0 │
│ 6f5bd62a974c4b8660783b792b18c7597a805c1e ┆ 2023-08-09T17:43:54.095 ┆ 2023-08-09T19:50:25.802 ┆ 3 ┆ 122395366.666667 │
│ d5fa7b8a557ca0b0ca272cf82c66c2cc2e940f39 ┆ 2023-08-10T12:17:21.040 ┆ 2023-08-10T13:24:30.883 ┆ 2 ┆ 160789100.0 │
│ f9ffd9ae2917871092485f8c5bc393d6e96f7784 ┆ 2023-08-10T17:08:01.088 ┆ 2023-08-11T13:20:54.459 ┆ 7 ┆ 144515628.571429 │
│ aa37572ddc79f5d2c040b8480ec9af6e5d1550d9 ┆ 2023-08-11T13:52:12.318 ┆ 2023-08-14T13:11:12.804 ┆ 11 ┆ 145850981.818182 │
│ ad49c17ebacd04585fb4355e079ec87d7fc63d8d ┆ 2023-08-14T14:12:53.863 ┆ 2023-08-14T14:12:53.863 ┆ 1 ┆ 125579000.0 │
└──────────────────────────────────────────┴─────────────────────────┴─────────────────────────┴────────────────┴──────────────────┘
Wow that's amazing, nushell is very powerful, I am definitely going to play with it and learn more about it, I think dply is probably too basic for what you need to do.
I love dply, and I'm trying to learn about it more so I can integrate it into nushell. I think it would be awesome to have a datafusion set of commands behind a feature. I love what you've done with it and I'm especially envious of group_by() | summarize().
Thank you @fdncred for suggesting this improvement, I have added a few time conversion functions so now it is possible to convert numbers to durations and durations to numbers (following lubridate R package):
〉json("buildtimes.json") |
::: mutate(date=ymd_hms(date)) |
::: group_by(commit) |
::: summarize(
::: min_date = min(date),
::: max_date = max(date),
::: startups = n(),
::: perf_datetime = mean(time)
::: ) |
::: mutate(perf_datetime=dnanos(perf_datetime)) |
::: arrange(min_date, max_date) |
::: relocate(perf_datetime) |
::: head(5)
:::
shape: (5, 5) elapsed: 0.007s
┌────────────────┬──────────────────────────────────────────┬─────────────────────────┬─────────────────────────┬────────────────┐
│ perf_datetime ┆ commit ┆ min_date ┆ max_date ┆ startups │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ duration[ns] ┆ str ┆ datetime[ms] ┆ datetime[ms] ┆ i64 │
╞════════════════╪══════════════════════════════════════════╪═════════════════════════╪═════════════════════════╪════════════════╡
│ 0.120679900s ┆ 2bb0c1c618f961843b49432fb7a21304b41493af ┆ 2023-07-03T17:03:08.605 ┆ 2023-07-03T17:45:46.859 ┆ 2 │
│ 0.129605145s ┆ 406b606398bf18c98063fbe998a4d27f75067eef ┆ 2023-07-05T12:41:06.186 ┆ 2023-07-07T13:31:08.454 ┆ 40 │
│ 0.132144692s ┆ 8e38596bc9494357f01f166076e8d563f28016f3 ┆ 2023-07-07T13:35:20.692 ┆ 2023-07-10T15:42:42.746 ┆ 13 │
│ 0.184785800s ┆ cf36f052c46b6efe57500e3acb7f52d2d0cb8d2e ┆ 2023-07-12T15:10:55.382 ┆ 2023-07-12T19:23:46.223 ┆ 2 │
│ 0.136068100s ┆ b2043135ed956ead0d3b5d5df49ea9d929dc7120 ┆ 2023-07-12T20:56:26.198 ┆ 2023-07-13T20:14:08.484 ┆ 2 │
└────────────────┴──────────────────────────────────────────┴─────────────────────────┴─────────────────────────┴────────────────┘
It is not as nice as Nushell but it is an improvement :smile:
In #47, I learned how to open a json file, as long as it's jsonl/ndjson and named with a .json file extension. Now I have a int64 field that needs to be interpreted as a timestamp(nanoseconds). Is there a way to cast the field to some type of a duration or timestamp datatype?