Closed alerque closed 5 years ago
In YAML, a stream of multiple documents must be separated by ---
separators. For your example, here is an input that works:
{ "one": 1 }
---
{ "two": 2 }
The slurp option is not interpreted by yq - it's forwarded to jq.
I kind of understand how it ended up this way, but something is still off. It seems like something is being done in the wrong order. If yq was just passing off the argument to jq, it seems like it shouldn't change the way YAML input streams were read (it does) and should be able to handle JSON input streams the same way jq does (it doesn't).
The -s
flag does not change the way YAML input streams are read. It changes the way JSON input streams are read. I'm not sure what you mean by things being done in the wrong order; can you elaborate?
The thing that you might be missing is that yq does not handle JSON documents; yq handles YAML documents that might use a dialect of YAML that looks exactly like JSON.
But YAML is not JSON. While the semantics of a single JSON document are preserved within YAML, the semantics of a stream of documents are not. For yq to parse a stream of multiple documents, they have to be separated by the YAML document separator.
It changes the way JSON input streams are read.
This is confusing because it does not change the way a JSON stream is read from the input to yq, it changes how jq handles the data already parsed from YAML coming into it from yq. Someone assuming that an argument being passed through would make the input behave the same way, but it rather unexpectedly does not.
See the above commit reference for a situation where this becomes problematic. JSON data is coming back from an API, but I had to break it into batches. I want to process and output this data as YAML, but without preprocessing it in jq first I'm unable to get yq to make sense of it.
In order to handle the concatenated JSON coming in I would have expected to change:
yq -M -e -y '.'
into:
yq -s -M -e -y '.'
but instead it has to be:
jq -s '[.]' | yq -M -e -y '.[0][]'
That seems a lot more convoluted than it needs to be.
If nothing else this should probably me mentioned in the help / docs because it is counter intuitive.
I'll put a note in the docs, that's a good idea.
Aside from that, there's not much I can do, short of replacing PyYAML with another parser. I can't emulate jq's slurp mode without trying to parse each doc with a JSON parser. PyYAML's handling of document streams follows the YAML spec precisely, and requires documents to be separated by ---
.
Could there be a "raw pass through" or "input is json" option of some kind to not try to parse the input steam as YAML at all given the case of it being JSON already? It already wasn't obvious to me (see #51) that JSON was valid input, now we have a case where it isn't a drop in replacement for jq
. Perhaps a flag to skip the YAML input parsing stage and only handle the output stage would be useful (and intuitive).
Closing this because I don't think there's anything to be done here.
It was my assumption that slurping multiple inputs should work the same as in
jq
, but it does not. For example this works:But this fails to parse past the first object:
You can workaround using jq as an intermediary to slurp into array, then pass to yq and strip off the extra array: