Open RichardLitt opened 7 years ago
I'm not clear what the usecase is for this format. the data for a given repo all comes back at once, so there's nothing to stream. We could stream results by repo in an org, but that would be a separate return format.
Or are we talking about the output being fed into a stream? It would be simple enough to take all the extra space out of the JSON printed to the console and add a newline at the end.
--flat
would indicate something entirely different to me, some kind of denesting of data. Maybe we want to go with --nd
or something like that.
Fair point. This may already solve @diasdavid's needs, then. I'll point him here and ask.
An ndjson stream of documents would be 👌🏽 Thanks for considering adding this feature, @RichardLitt :)
@diasdavid Can you clarify - Is the only difference a one-line JSON object with a newline at the end?
ndjson are "new-line separated JSON objects" and the beauty of it is that lets you pipe a stream of JSON objects and start parsing them as soon as they are ready. It is also a way of sharding a very large JSON blob into smaller units that are more manageable.
Thanks for clarifying @diasdavid. I get what ndjson is, my question is more how this tool is to be used.
Envisaged use case 1: script is called via a batch script and the individual results streamed out as ndjson. Something like
for REPO in `cat repo-list.txt`
do
name-your-contributors --user $ME --repo $REPO --ndjson > {STREAM_SOCKET}
done
Possible use case 2: the --org
option, instead of returning a single JSON array, returns an ndjson stream. This would only be for the convenience of the caller I believe, since in my testing I've exhausted the GitHub rate limit with all query results in memory and it wasn't a problem. Maybe on a very small box, like an ec2 micro, it would be.
Use case 2 would be a bad idea probably. We're not streaming on the inside and I don't see us rewritting the script from the ground up to do so, so the script will be the bottleneck: it won't return the first NDJSON object until it's ready to return them all. So overall I think it will slow things down for the consumer, or be a wash at best.
As far as JSON documents go we're not returning much, so parsing shouldn't be a bottleneck.
Expose a --flat option, where:
Asked for in #16.