Open rjp opened 2 years ago
decode all the objects
should be default
what else do we need all the (non-option) argv
for?
also jq
takes multiple input files
jq [options...] filter [files...]
... like many other unix/gnu tools, so they play nice with xargs
, for example
find . -name '*.json' -print0 | xargs -0 gron
evaluates to gron a.json b.json c.json d.json
Adds: -a, -all flag which means "decode all the objects, pretending it's a JSON stream even if it's not actually."
could be parsed as array of json documents, as suggested in https://github.com/tomnomnom/gron/issues/28#issuecomment-915170293
$ gron <( echo '{ "hello": "world" }' ) <( echo '{ "hello2": "world2" }' )
file = [];
file[0] = {};
file[0].hello = "world";
file[1] = {};
file[1].hello2 = "world2";
using filenames would look weird in this example
$ echo <( echo '{ "hello": "world" }' )
/dev/fd/63
... but filenames could be enabled with a -H
option → #72 (or disabled with a -h
option)
$ gron -H a.json b.json
file = {};
file["a.json"] = {};
file["a.json"].hello = "world";
file["b.json"] = {};
file["b.json"].hello = "world";
what else do we need all the (non-option) argv for?
Ah, this is "decode all the objects in the input", not "decode all the objects in the command line arguments", because I have things that output multiple objects in a single file non-stream format which I needed to decode.
But yes, iterating over the arguments does make sense if only for xargs
usage.
oops, i confused this issue with #28
Adds: -a, -all flag which means "decode all the objects, pretending it's a JSON stream even if it's not actually."
now it makes sense to hide this feature behind a flag
as {"a":1}{"b":2}
is an invalid json document
Fixes #70 (implicitly), #23. May also have an impact on the "high memory usage" issues but I'm doing more testing there.
Adds:
-a
,-all
flag which means "decode all the objects, pretending it's a JSON stream even if it's not actually."Rationale:
gron
only decodes the first object,gron -s
requires a "correctly" formatted JSON stream (one object per line), but it's not uncommon to get multiple objects per line with tools that don't support JSON stream formatting.This does require a positionable stream, however, since the JSON decoder can read past the end of an object to be sure its parsed correctly.
io.Seekable
doesn't work, unfortunately, because whilst we know where we want to be (d.InputOffset()
), we don't actually know where we currently are which precludes the use ofio.SeekCurrent
and, bizarrely, it turns out thatio.SeekSet
gets progressively slower as you seek further and further into your (in this case)bytes.Buffer
.Thus we keep track of where we want to be (
moved
) and create abytes.NewReader
for each attempted decode at the correct position. Crufty, definitely, and memory-allocation heavy, probably, but it works and is surprisingly not that bad even on large files.My test 85MB JSON single line input takes ~64s (x86_64), ~43s (arm64) and ~275M to parse into 1024 objects comprising 1GB of output text. Compare to
jq
: ~25s (x86_64), ~11s (arm64) using ~630M giving 350MB of output.