Open taktoa opened 6 years ago
Given the slow pace of jq development, and without meaning to detract from the worthiness of your suggestion, I'm wondering whether the existing conversion tools (e.g. via yarn global add cbor
or gem install cbor-diag
) aren't in practice adequate, especially given jq's streaming parser for handling ginormous JSON documents. In my experience, given a large JSON-compatible blob, working with such tools to extract the subset of interest as JSON, usually works out well. (Sometimes all that's really needed to get things going is the ability to infer a JSON schema.)
In the case of other formats (e.g. YAML), it's easy enough to create a FORMAT2JSON-jq-JSON2FORMAT tool, and there is something to be said in favor of adhering to that approach -- it keeps the stuff that can be left out of jq, out of jq, thus preventing bloat and hopefully allowing jq developers to expend their limited resources on other things, such as making it better able to handle very large documents :-)
It could be an optional dependency kind of thing. I don't think it'd bloat up the codebase too much; CBOR and msgpack are already basically JSON. If the issue is just manpower I could take a look at writing a PR.
I can’t speak for the maintainers, but this is what I’d suggest you consider: fork jq, and add support for one or more of the formats that are of interest to you. That way, you’ll benefit immediately, and it will be relatively easy to measure the performance impacts, make an informed assessment, and go from there.
By the way, I’d estimate that adding msgpack would add about 10% to the size of the jq binary, with a corresponding increase in the startup time. Since jq is still fast to start, and since lightweightedness is a valued feather in jq’s cap, it seems unlikely that the cost would be judged worthwhile, especially given that there are other contenders.
I definitely want support for a range of binary-JSON formats, including PostgreSQL's JSONB.
Q: Parse the whole thing at once, of have a special "external" jv
type with cursors and indexing operations?
I want the latter, especially for formats that support online parsing.
Often I want to use
jq
for munging some dataset that is typically a large array of objects, but I don't want to bear the cost of all the repeated keys or the overhead involved in parsing. For these kinds of situations, I think it would be really useful ifjq
could accept CBOR or msgpack.