Open xvzftube opened 3 years ago
In Miller 6 (the as-yet-unreleased Go port) there is now support for JSON arrays. So this works:
mlr --icsv --ojson --from mtcars.csv cut -f mpg,wt then put -q '
for (k, v in $*) {
@output_record[k][NR] = v;
}
end {
emit @output_record
}
'
{
"mpg": [21, 21, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4, 30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8, 19.7, 15, 21.4],
"wt": [2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19, 3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2, 1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14, 1.513, 3.17, 2.77, 3.57, 2.78]
}
I can also make a verb which does this kind of thing ... or maybe just a recipe item for the Miller docs -- ?
Part of me is tempted to make STAN
a file format so mlr --icsv --ostan cut -f mpg,wt mtcars.csv
. However, STAN isn't a separate file format; it's just JSON. On the third hand ... it would be really neat to have an "un-stan" functionality which would convert the mpg
and wt
arrays back into tabular format .....
Really this is a kind of sideways display. CC @aborruso and @ashmishr with regard to https://github.com/johnkerl/miller/issues/321.
A way to reuse this code more easily:
$ cat mkstan.mlr
for (k, v in $*) {
@output_record[k][NR] = v;
}
end {
emit @output_record
}
Then
$ mlr --from whatever-file.dat --ojson cut -f x,y then put -q -f mkstan.mlr
Anyway.
mkstan.mlr
will work with Miller 6 (let me know if you want me to make you a binary)--istan
and --ostan
-- so you can do mlr --icsv --ostan cut -f mpg,wt mtcars.csv
-- in Miller 6 as well. The STAN format would be a documented subset of JSON. But only once I better digest https://mc-stan.org/docs/2_25/cmdstan-guide/json.html and make sure I'm handling a general case for the Stan tool.Thinking more, and having read more: There's more to Stan format than just single-dimensional arrays. So I think I'll do:
mkstan.mlr
will work with Miller 6 (let me know if you want me to make you a binary)unstan.mlr
as well.columns-to-arrays
and arrays-to-columns
verbs which will be source-code implementations of mkstan.mlr
and unstan.mlr
. So mlr --icsv --ojson cut -f mpg,wt then columns-to-arrays mtcars.csv
For reference (since the Miller 6 port is some weeks/months away from being done):
# ================================================================
# Sample CSV input:
#
# $ cat input.csv
# a,b
# 1,4
# 2,5
# 3,6
#
# Invocation:
#
# $ mlr --icsv --ojson put -q -f mkstan.mlr input.csv
#
# Sample JSON output:
#
# {
# "a": [1, 2, 3],
# "b": [4, 5, 6]
# }
# ================================================================
for (k, v in $*) {
@output_record[k][NR] = v;
}
end {
emit @output_record
}
# ================================================================
# Sample JSON input:
#
# $ cat stan.json
# {
# "a": [1, 2, 3],
# "b": [4, 5, 6]
# }
#
# Invocation:
#
# $ mlr --ijson --ocsv put -q -f unstan.mlr stan.json
#
# Output:
#
# a,b
# 1,4
# 2,5
# 3,6
# ================================================================
# Find array length
n = 0;
for (k, v in $*) {
n = max(n, length(v));
}
keys = keys($*);
# Emit one record per array entry
for (int i = 1; i <= n; i+=1) {
map output_record = {};
for (k in keys) {
output_record[k] = $[k][i];
}
emit output_record;
}
Thinking more, and having read more: There's more to Stan format than just single-dimensional arrays. So I think I'll do:
The
mkstan.mlr
will work with Miller 6 (let me know if you want me to make you a binary)I'll make an
unstan.mlr
as well.I'll make new
columns-to-arrays
andarrays-to-columns
verbs which will be source-code implementations ofmkstan.mlr
andunstan.mlr
. Somlr --icsv --ojson cut -f mpg,wt then columns-to-arrays mtcars.csv
Thanks for all of the thought you put into this. I like the idea of the new verbs.
I have been stringing a shell script in with mlr to prepare the data for stan. I wanted to open this as a feature request. As oppose to my csv2json.sh script maybe a flag
—json-cells-to-arrays
or any other more suitable name.csv2json.sh Is a jq shell script
As a reference this page shows the format of the json needed for
CmdStan
https://mc-stan.org/docs/2_25/cmdstan-guide/example-model-and-data.html