Closed fdncred closed 1 year ago
Hi @fdncred, I didn't release the version with the json
function, it should work if you cargo install from source.
ah, ok. i'll try that. thanks! i'm trying to get something kind of working with nushell. we'll see how it goes.
okay let me know how it goes, I am going to release 0.2.0 to crates.io.
Kind of striking out.
Test 1 - Didn't really think this would work because it would have to evaluate nushell's open in the json()
function.
dply -c 'json(open ~\.local\share\nushell\startup-times.nuon | where build == release) | select(commit) | show()'
Test 2 - expected it to work but maybe i'm doing something wrong
dply -c 'json("buildtimes.json") | select(commit) | show()'
Error: Arrow error: Json error: Not valid JSON: EOF while parsing a list at line 1 column 1
Caused by:
Json error: Not valid JSON: EOF while parsing a list at line 1 column 1
had to rename it .txt to get github to allow it buildtimes.txt
This worked as parquet but I can't get perf to be a duration. Not sure how that works exactly.
❯ dply -c 'parquet("buildtimes.parquet") |
❯❯❯ group_by(commit) |
❯❯❯ summarize(min_date = min(date),
❯❯❯ max_date = max(date),
❯❯❯ cmt_count = n(),
❯❯❯ perf_ns = mean(time)) |
❯❯❯ arrange(min_date, max_date) |
❯❯❯ show()'
shape: (29, 5) elapsed: 0.009s
┌──────────────────────────────────────────┬───────────────────────────────┬───────────────────────────────┬────────────────┬──────────────────┐
│ commit ┆ min_date ┆ max_date ┆ cmt_count ┆ perf_ns │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ datetime[ns] ┆ datetime[ns] ┆ i64 ┆ f64 │
╞══════════════════════════════════════════╪═══════════════════════════════╪═══════════════════════════════╪════════════════╪══════════════════╡
│ 2bb0c1c618f961843b49432fb7a21304b41493af ┆ 2023-07-03T17:03:08.605833100 ┆ 2023-07-03T17:45:46.859419300 ┆ 2 ┆ 120679900.0 │
│ 406b606398bf18c98063fbe998a4d27f75067eef ┆ 2023-07-05T12:41:06.186069800 ┆ 2023-07-07T13:31:08.454208 ┆ 40 ┆ 129605145.0 │
│ 8e38596bc9494357f01f166076e8d563f28016f3 ┆ 2023-07-07T13:35:20.692982100 ┆ 2023-07-10T15:42:42.746287500 ┆ 13 ┆ 132144692.307692 │
│ cf36f052c46b6efe57500e3acb7f52d2d0cb8d2e ┆ 2023-07-12T15:10:55.382326800 ┆ 2023-07-12T19:23:46.223713400 ┆ 2 ┆ 184785800.0 │
│ b2043135ed956ead0d3b5d5df49ea9d929dc7120 ┆ 2023-07-12T20:56:26.198039100 ┆ 2023-07-13T20:14:08.484523 ┆ 2 ┆ 136068100.0 │
│ 4804e6a151ca0f212c3f4b097b4d805a69535149 ┆ 2023-07-14T16:39:30.111034 ┆ 2023-07-14T20:25:04.629634800 ┆ 7 ┆ 144062385.714286 │
│ 48271d8c3e1f83723f005ae1809ebd5026783f8a ┆ 2023-07-17T13:09:13.074710300 ┆ 2023-07-17T19:20:06.502104 ┆ 3 ┆ 149020600.0 │
│ a5a79a7d95822bc143090612e1813f3b06befbf4 ┆ 2023-07-18T16:44:35.140979400 ┆ 2023-07-20T16:01:20.985936100 ┆ 10 ┆ 159522440.0 │
│ 9db0d6bd34a99805c6da296688aa186778be5a86 ┆ 2023-07-24T12:57:55.020534400 ┆ 2023-07-24T13:15:05.868456200 ┆ 3 ┆ 138853133.333333 │
│ 208071916209af5a4159b131e438aa6cab524532 ┆ 2023-07-25T12:22:59.358805 ┆ 2023-07-25T17:46:46.435686600 ┆ 5 ┆ 231826620.0 │
│ a33b5fe6ce97b5e9fe8a774c13e783ed65c1b591 ┆ 2023-07-25T20:39:13.457784500 ┆ 2023-07-27T14:53:35.224492100 ┆ 7 ┆ 200283471.428571 │
│ f8d325dbfef5fec7ee109e37c624236998de8843 ┆ 2023-07-27T15:05:15.536044300 ┆ 2023-07-27T15:26:56.575707 ┆ 4 ┆ 114143575.0 │
│ 6aa30132aae188639a78ba8fd7feddc952d5792e ┆ 2023-07-27T15:56:29.689214800 ┆ 2023-07-27T18:06:09.548466600 ┆ 8 ┆ 119858087.5 │
│ 8403fff34500d30439545519c88c7d942c717e3e ┆ 2023-07-27T20:02:18.214354 ┆ 2023-07-28T14:01:34.190749100 ┆ 3 ┆ 131968033.333333 │
│ 94bec720791f716b44cb23db363f53e2fa7acce3 ┆ 2023-07-31T13:00:49.600253500 ┆ 2023-08-01T13:41:42.960650600 ┆ 21 ┆ 123044576.190476 │
│ f6033ac5af75073dddce2400304448dbbadd0318 ┆ 2023-08-01T14:35:39.008614900 ┆ 2023-08-01T18:00:31.016108300 ┆ 2 ┆ 155370400.0 │
│ 778a00efa10735e7eb368aea1ddfeb6af3d3720a ┆ 2023-08-01T20:38:56.294492600 ┆ 2023-08-02T15:11:37.531105600 ┆ 4 ┆ 152309825.0 │
│ ec4941c8ac45f94ab408753b173ce991ce0fafd3 ┆ 2023-08-02T16:05:31.403641700 ┆ 2023-08-02T16:05:31.403641700 ┆ 1 ┆ 121661700.0 │
I see, the json function works for ndjson (I need to document that), you can convert it using jq
:
cat buildtimes.json| jq -c '.[]' > buildtimesnd.json
I got this as result using the converted file buildtimesnd.txt:
〉json("./buildtimesnd.json") | glimpse()
:::
┌────────────┬────────┬──────────────────────────────────────────────┐
│ Rows: 179 ┆ Type ┆ Values │
│ Cols: 7 ┆ ┆ │
╞════════════╪════════╪══════════════════════════════════════════════╡
│ allocator ┆ str ┆ mimalloc, mimalloc, mimalloc, mimalloc,... │
│ build ┆ str ┆ release, release, release, release, relea... │
│ build_time ┆ str ┆ 2023-07-03 10:40:42 -05:00, 2023-07-03... │
│ commit ┆ str ┆ 2bb0c1c618f961843b49432fb7a21304b41493af,... │
│ date ┆ str ┆ 2023-07-03 12:03:08.605833100 -05:00,... │
│ time ┆ i64 ┆ 132610400, 108749400, 169129700, 13610890... │
│ version ┆ str ┆ 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1,... │
└────────────┴────────┴──────────────────────────────────────────────┘
I get different results
apparently, the filename has to be named with the .json
extension, not .txt
or .jsonl
or .ndjson
or any other extension.
❯ dply -c 'json("buildtimesnd.json") | glimpse()'
┌────────────┬────────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Rows: 179 ┆ Type ┆ Values │
│ Cols: 7 ┆ ┆ │
╞════════════╪════════╪═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ allocator ┆ str ┆ mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc, mimalloc,... │
│ build ┆ str ┆ release, release, release, release, release, release, release, release, release, release, release, release, release, release, release, release... │
│ build_time ┆ str ┆ 2023-07-03 10:40:42 -05:00, 2023-07-03 10:40:42 -05:00, 2023-07-05 07:36:05 -05:00, 2023-07-05 07:36:05 -05:00, 2023-07-05 07:36:05 -05:00,... │
│ commit ┆ str ┆ 2bb0c1c618f961843b49432fb7a21304b41493af, 2bb0c1c618f961843b49432fb7a21304b41493af, 406b606398bf18c98063fbe998a4d27f75067eef,... │
│ date ┆ str ┆ 2023-07-03 12:03:08.605833100 -05:00, 2023-07-03 12:45:46.859419300 -05:00, 2023-07-05 07:41:06.186069800 -05:00, 2023-07-05 12:31:59.73713950... │
│ time ┆ i64 ┆ 132610400, 108749400, 169129700, 136108900, 110939500, 106339100, 221954000, 125643500, 132976300, 131243200, 136889900, 110602800, 110799700,... │
│ version ┆ str ┆ 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1, 0.82.1... │
└────────────┴────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
ah that's interesting, it looks it only looks for .json
files to handle reading multiple partitioned files under a folder:
$ ls data
buildtimesnd.txt buildtimesnd1.json buildtimesnd2.json
then passing the data
folder only reads .json
files:
dply -c 'config(max_table_width=50); json("data") | glimpse()'
┌────────────┬────────┬──────────────────────────┐
│ Rows: 358 ┆ Type ┆ Values │
│ Cols: 7 ┆ ┆ │
╞════════════╪════════╪══════════════════════════╡
│ allocator ┆ str ┆ mimalloc, mimalloc,... │
│ build ┆ str ┆ release, release,... │
│ build_time ┆ str ┆ 2023-07-03 10:40:42... │
│ commit ┆ str ┆ 2bb0c1c618f961843b494... │
│ date ┆ str ┆ 2023-07-03... │
│ time ┆ i64 ┆ 132610400, 108749400,... │
│ version ┆ str ┆ 0.82.1, 0.82.1, 0.82.... │
└────────────┴────────┴──────────────────────────┘
I'll see if it is possible to override this behavior if the user pass an extension so that it reads the file as ndjson
.
Thank you @fdncred I changed the behavior in #53 so that the default extension is only used when loading form a folder without specifying extension:
dply -c 'config(max_table_width=50); json("buildtimes.txt") | glimpse()'
┌────────────┬────────┬──────────────────────────┐
│ Rows: 179 ┆ Type ┆ Values │
│ Cols: 7 ┆ ┆ │
╞════════════╪════════╪══════════════════════════╡
│ allocator ┆ str ┆ mimalloc, mimalloc,... │
│ build ┆ str ┆ release, release,... │
│ build_time ┆ str ┆ 2023-07-03 10:40:42... │
│ commit ┆ str ┆ 2bb0c1c618f961843b494... │
│ date ┆ str ┆ 2023-07-03... │
│ time ┆ i64 ┆ 132610400, 108749400,... │
│ version ┆ str ┆ 0.82.1, 0.82.1, 0.82.... │
└────────────┴────────┴──────────────────────────┘
Thanks for the follow-up. I haven't tried it out yet, but I looked at the PR and it seemed reasonable. Appreciate the work!
I was trying to follow along with your readme and process some json data but I can't seem to get it to work. Am I doing something wrong?
Also, when I go in the repl, I don't see a
json(
function like theparquet(
function.