jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.04k stars 1.55k forks source link

feature request: slice(_) #379

Open pkoppstein opened 10 years ago

pkoppstein commented 10 years ago

Description: slice(composite) expects . to be an object and its argument to be a composite specifying key names. If the argument is an array, it should be an array of strings; otherwise it should be an object. The result is an object with the specified keys and corresponding values extracted from the input object where available, or else null.

Implementation:

def slice(a): . as $in
   | if (a|type) == "array" then a else (a|keys) end
   | reduce .[] as $key ( {}; . +  { ($key) : $in[$key] } )  ;

Examples:

$ jq -n -c '{"a":1, "b":2} | slice( ["a", "c"] )'
{"a":1,"c":null}

$ jq -n -c '{"a":1, "b":2, "c":3} | slice( {"a":null, "b":null] )'
{"a":1,"b":2}
nicowilliams commented 10 years ago

Yes, that's clever. I think you want something more like:

You could add an explicit error("slice(a) requires an array or object as argument").

What if the input is a string? Can we slice a string? How? By interpreting a as a list of offset/length pairs? (If the last length is missing, then to the end.) Isn't this how array slicing could work too? If it did then you could make it work for strings by exploding the string first then imploding the result.

pkoppstein commented 10 years ago

The specification I offered is intentionally simple, the simplifying principle being that it do no more than supporting the extraction of key/value pairs from an object. This is in accordance, for example, with RoR's slice.

Another consideration is that jq already has some support (in an admittedly rudimentary form) for array and string slicing (via the [i:j] construct). I didn't want to create any kind of redundancy. Admittedly, .[1:2] + .[3:4] is rather cumbersome, but I thought perhaps that @stedolan had plans for "slicing" strings and arrays.

One possibility would be to stick with a simple specification for now as it can easily be extended later without breaking backwards compatibility.

pkoppstein commented 10 years ago

Here's a more generic version, with some error handling:

Synopsis:

slice(arg) extracts zero or more elements from its input, based on the keys or indices specified by the argument, which must be an array of integers, an array of strings, or an object. Negative integers are interpreted by counting backwards from the end of the input string or array.

Examples:

$ jq -n -c '[0,10,20] | slice([0,0,-1,100])'
[0,0,20,null] 
$ jq -n -c '"abc" | slice([0,0,-1,100])'
"aac"
$ jq -n -c '{ "a":1}" | slice(["a","b", "a"])'
{"a":1,"b":null}
$ jq -n -c '{ "a":1, "b":2}" | slice({{ "a":null, "c":null})'
{"a":1,"c":null}

Implementation

def slice(a): . as $in
   | (a|type)
   | if . == "array" or . == "object" then $in
     else error("arg of slice must be a composite")
     end
   | if type == "string" then _sliceString(a, empty) 
     else if type == "array" then _sliceArray(a) 
          else if (a|type) == "array" then a else (a|keys) end
               | reduce .[] as $key ( {}; . +  { ($key) : $in[$key] } ) 
          end
     end;

# If the input string has no character at one of the specified indices, then
# a null string is inserted in its stead, e.g. "a" | slice([0, 0, 1, 1]) => "aa"
def _sliceString(a): 
  if (a|type) == "array" then .
  else error("arg of _sliceString must be an array of integers")
  end
  | . as $in
  | length as $len
  # Handle negative indices:
  | (a | map ( if (type == "number" and . < 0) then $len + . else . end)) as $a
  | $in
  | explode | [ .[ ($a)[] ] // empty ] | implode ;

# If the input array has no value at one of the specified indices, then 
# null is used in its stead.
def _sliceArray(a):
  if (a|type) == "array" then .
  else error("arg of _sliceArray must be an array of integers")
  end
  | . as $in
  | length as $len
  # Handle negative indices:
  | (a | map ( if (type == "number" and . < 0) then $len + . else . end)) as $a
  | [ $in[ ($a)[] ] ] ;
nicowilliams commented 10 years ago

@pkoppstein You could use explode and implode for handling the string as an array of codepoints.

BTW, we must always remember that function arguments are closures that can generate. Therefore saying that an argument must be an array is very much the right thing to do as you did. But we should probably have a section of the manual about surprises resulting from unintended use of generation.

pkoppstein commented 10 years ago

@nicowilliams wrote:

You could use explode and implode for handling the string as an array of codepoints.

The latest version (June 4) version does use explode/implode, so either I'm missing something or you're looking at the earlier version. Either way, please let me know what I should do. Tx.

nicowilliams commented 10 years ago

Oh, sorry, I missed it. Still, it looks like the string and array slicing can be merged into:

def _slice(a):
  if (a|type) == "array" then .
  else error("arg of _sliceArray must be an array of integers")
  end
  | type as $type
  | if $type == "string" then explode else . end
  | . as $in
  | length as $len
  # Handle negative indices:
  | (a | map ( if (type == "number" and . < 0) then $len + . else . end)) as $a
  | [ $in[ ($a)[] ] ]
  | if $type == string then implode else . end;
pkoppstein commented 10 years ago

@nicowilliams - I wanted to keep the string and array slicing well-separated, both for ease of review and to make alterations easier. If the semantics is fine, then feel free to optimize away!

nicowilliams commented 10 years ago

@pkoppstein Send me a PR that includes docs updates and tests and I'll merge faster. (The next few weeks look likely to keep me too busy. With a PR I can pull it, merge, run make check, see that I like the docs updates, and push it -- all of which saves me the work of writing the docs and the tests and getting them to pass. Alternatively, if you don't mind waiting a few more weeks I'll get to it then.)

jnothman commented 8 years ago

Could I suggest take or extract as a clearer name?

pkoppstein commented 8 years ago

@jnothman wrote:

Could I suggest take or extract as a clearer name?

Thanks for reviving this ER. How do you like the name "query"? This name makes quite a lot of sense in the context of objects (as in: "query(q) uses q as a 'query' object to query the input object").

jnothman commented 8 years ago

I'm not taken by "query" as a verb in this context. Even in your explanation of the query name, you suggest it is because of the argument being the query. To query is to seek deep information, beyond the excerpts produced by this function.

I still think the correct verb is extract; to quote from the jq manual, "There are a lot of builtin filters for extracting a particular field of an object" which is what you are doing, plurally.

pkoppstein commented 8 years ago

@jnothman - Between "slice" and "extract", I think I still prefer "slice", partly perhaps because that name is commonly used for this kind of thing, but mainly because the name "extract", although fine when referring to a specific key or index, or to an array of such keys or indices, seems inappropriate when referring to a JSON object (i.e., what I was calling the query object).

I'm by no means pushing for "query", but I was wondering -- what specifically do you have in mind for query/1?

"take" would be OK except that some of the programming languages with which I'm familiar use it in a slightly different way (e.g. ruby: "Returns first n elements from the array"; python: "Return first n items of the iterable as a list").

jnothman commented 8 years ago

I don't have anything in mind for query/1, but it seems a useful name available for custom code. But it's all semantics. FWIW, relational algebra calls this operation 'projection', but that seems obscure.

Btw, is there any use of having slice({"a": 1, "b": 1}) set those 1s as default values in place of null? It seems it would be a trivial change to the reduce call.

jnothman commented 8 years ago

And I find "slice" implies contiguity as the normal use-case.

pkoppstein commented 8 years ago

@jnothman asked:

is there any use of having slice({"a": 1, "b": 1}) set those 1s as default values

Certainly that would be potentially useful, but don't you think it departs too much from the extract/take/slice/project paradigm? If I ask for "height", and there is no "height" specified, then isn't it safer to set "height" to null?

I'm inclined to view the query object as simply a template rather than a specification of defaults, but I could easily be persuaded otherwise, e.g. by a good example, an alternative name, or some mathematical magic.