Open lylemoffitt opened 7 years ago
Hmm, OK. I suppose the docs do need some clarification. I do think that @stedolan was trying to go for an intuitive description of an intuitive language. However, jq is a rather powerful language with aspects that are not obvious at first glance.
.
is always "the current input value". Always. You can add |.|
in most places and... it changes nothing, because it means "produce the current input value (from the expression on the left of the pipe) to the expression on the right of the pipe".
Function arguments might be particularly confusing. It's best to think of functions as having ONE (and only one) value argument and zero, one, or more function arguments. E.g., def foo: . + .;
has one value argument, while def foo(bar): . + bar;
has one value argument and one function argument (bar
), and outputs . + <bar applied to .>
.
Parenthesis can also be used to group expressions. E.g., (1 + 2) * 3
. I think this is fairly obvious, but it's true and surprising that the manual does not mention this!
Parenthesis can be important to deal with precedence issues. E.g., 1, 2 | . * 3, . * 5
could be interpreted in a number of different ways (though in only one way by jq) -- it's better to use parenthesis to avoid confusion. E.g., (1, 2) | ((. * 3), (. * 5))
or 1, (2 | ((. * 3), (. * 5)))
.
Thanks for your input! Keep it coming. It will make jq better.
@nicowilliams Thanks for the quick response. I get what they were going for, I just felt like it kinda tripped over itself a little to get there. The language is intuitive and simple, I just with it had been explained better. I ignored jq
in favor of the less powerful, but easier to grok jo
, months ago exactly because of the documentation.
Keep it coming.
I definitely have more ideas, but they are more focused around enhancing the programming language aspects. I wanted to see how receptive the community is first, before going further.
@lylemoffitt - I'm not sure this is relevant, but since you wrote:
I definitely have more ideas, but they are more focused around enhancing the programming language aspects.
I thought I'd mention that a jq documentation effort has just started at stackoverflow.com. Maybe it could be justified by adopt a "programming language" approach?
An entry point: https://stackoverflow.com/documentation/jq/topics
@pkoppstein - Thanks for mentioning that. Wasn't aware of that feature on stackoverflow. It isn't really what I had in mind though.
Maybe it could be justified by adopt a "programming language" approach?
I think it would be better than the current one, but that's not really my call. I'm also not saying the current approach is bad either; I just don't think it's as effective as it could be. Like sed, jq is a great CLI tool that with an embedded DSL. In sed's documentation (its man page), they took the approach of emphasizing the DSL over the CLI. This (IMO) is probably what led to the long-term success of sed as a tool, but it also has the downside of making it harder to approach. I myself only recently understood the deeper nature of sed beyond its sed -e 's///'
usage in part because I found the documentation so dense. But, now that I'm over the hump, I wouldn't have it any other way.
TLDR - It's a tradeoff.
@lylemoffitt - I have no idea how the jq documentation at stackoverflow.com will pan out, but I like the combination of brevity and accessibility that characterizes the current "manual", so in a way it would make sense for the more "programming language" orientation that you have in mind to have a home at stackoverflow.com, if there is to be additional documentation there.
(Currently, as you may know, the home for the more technical aspects and details is the jq wiki. Maybe you'd like to start a "jq for Programmers" page there? The potential downside of that is the risk that things could get confusing with an official tutorial, an official manual, another manual on the jq wiki, and still another manual of sorts on stackoverflow ...)
My orientation is heavily influenced by the documentation I worked on for a large proprietary language. There were three distinct volumes:
@pkoppstein
modulo a few tweaks [...] brevity and accessibility
I'm inclined to agree with you here. I'm not 100% sure what the right approach is given that each has its own set of trade-offs.
the jq wiki
I hadn't actually seen the wiki before. Like most projects on GitHub, I had assumed it was empty of full of incomplete/outdated information. This one has some good information that is appropriate placed there. A "jq for Programmers" page there would probably be better than stackoverflow. Either way, it's always second-class to the reference material provided with a distribution.
Ideally, there should be a quick-reference that's just as accessible as the current man page, but aimed at more experienced users. Perhaps a good solution would be to have two separate man-pages? The current man jq
could stay focused on the quick-n-dirty CLI usage, while man jq-lang
could be focused on the language and.jq
module documentation.
@pkoppstein What's the copyright licensing associated with SO docs?
@nicowilliams - As best I can tell, the rules are elaborated in Section 3 ("Subscriber Content") of http://stackexchange.com/legal. The key point seems to be "all Subscriber Content that You contribute to the Network is perpetually and irrevocably licensed to Stack Exchange under the Creative Commons Attribution Share Alike license."
My (somewhat cursory) reading is that the contributor retains copyright and is not expected to grant an exclusive license.
@pkoppstein Excellent. Thanks.
I've pushed a partial fix for this, 6f9646a.
In relation to operators precedence, I found this table at Rosetta code:
Precedence | Operator | Associativity | Description |
---|---|---|---|
lowest | | | %right | pipe |
, | %left | generator | |
// | %right | specialized "or" for detecting empty streams | |
= |= += -= *= /= %= //= | %nonassoc | set component | |
or | %left | short-circuit "or" | |
and | %left | short-circuit "and" | |
!= == < > <= >= | %nonassoc | boolean tests | |
+ - | %left | polymorphic plus and minus | |
* / % | %left | polymorphic multiply, divide; mod | |
highest | ? | (none) | post-fix operator for suppressing errors |
@lylemoffitt
It's ironic that the dot filter is referred to the "least interesting filter", because it is the key to understanding the transformation of data through the script.
Yes, I will change "least interesting filter" with
Two important predefined filters are "." (pass), the filter that does nothing, and "empty", the filter that never produces values. The main laws for those filters and the |
(bind) and ,
(then) operators are:
. | a ≡ a
a | . ≡ a
empty , a ≡ a
a , empty ≡ a
empty | a ≡ empty
a | empty ≡ empty
a , (b , c) ≡ (a , b) , c
(a , b) | c ≡ (a | c) , (b | c)
By the way, for my sanity I decided to put names to all filters and operators
Filter/Op. | Name |
---|---|
. | pass |
| | bind |
, | then |
[ ] | values |
? | protect |
// | alternative |
The manual seems to deliberately avoid naming all things!
JJOR
@fadado wrote:
The manual seems to deliberately avoid naming all things!
Yes, that's one way the manual achieves a brilliant economy of expression and avoids the "cognitive burden" that comes with naming, especially if the names are potentially misleading, as is the case with "pass" for ".".
Readers can be encouraged to pronounce the single-character punctuation operators in accordance with their preferences for pronouncing the punctuation characters themselves (e.g. "dot" for ".", "pipe" for "|", and "comma" for ",").
Please note that []
is not an "operator" in the usual sense. Fundamentally, []
is the empty JSON array. The postfix use of []
is, in my opinion, best understood as a shorthand, i.e., under certain circumstances, expr | .[]
can be contracted to expr[]
and/or (expr)[]
.
The name "alternative" for "//" is appropriate as it is a two-character operator with a meaning that is unrelated to "/".
@fadado
operator precedence
That's interesting, and helpful, thanks. I was surprised to see that the alternator was right associative. Isn't it defined to evaluate left to right?
The main laws
This. This is more of the kind of thing I was talking about. Helpful, clear, concise. Even if this is alien to a normal user, it's still worth putting in, because of how innocuous it is.
@pkoppstein
one way the manual achieves a brilliant economy of expression and avoids the "cognitive burden" that comes with naming
Generally, easing cognitive burden goes hand in hand with low expressive power. The man page may come off as an easy read, but it does so at the cost of length and verbosity. If you're set on reading it, the length may not be important, but it's certainly off-putting. Part of the trade of for writing to a low bar is that, while it makes on-boarding easier, it dampens the long-term effectiveness. Now that I understand the language better, I would much rather have a normal function reference, but my only choice is to scroll through a lot of text trying to remember which section the function I'm looking for is under.
pronounce the single-character punctuation operators in accordance with their preferences
The problem with "call it whatever you want" mentality is that you lack community agreement. Especially if you want people to be able to find reference materials on stack overflow, they are going to need a common name to google. Searching for "jq slash-slash" is going to end in a bad user experience. Moreover, all of this is done in the name of bowing to fear that users will flee because you made them learn the names for things. If you structure the man page uniformly, they won't even notice the names. Once they get the formatting their eyes will just jump to the section they care about.
Please note that [] is not an "operator" in the usual sense.
I believe we are all in agreement. The man page uses the terms operator, filter, and function somewhat interchangeably. I believe, the general rule it follows is that filters have word-names, functions have word-names and explicit arguments in parens, and operators are symbols.
When it comes to learning how to use a tool, none of this complexity really matters. All you really want to know is how to grep the fields out of the stupid json. But when it comes to learning how to use a language, it's all very important. As I said before, jq's problem is that it's both. I remain with my estimation that the best approach is to split the two aspects into their own pages.
The manual seems to deliberately avoid naming all things!
Yes, that's one way the manual achieves a brilliant economy of expression and avoids...
Ok, if it is a feature and not a bug I will reframe my mind, and I can say the dot operator is like an all-pass filter...
@fadado wrote:
if it is a feature and not a bug I will reframe my mind
Thanks for the willingness to see it from another perspective.
... and I can say the dot operator is like an all-pass filter...
Yes, readers of the English-language edition of the jq documentation will have no trouble understanding references of the form "the operator", where is "dot", "comma", "pipe", or "query", and writing "the dot operator" rather than "the .
operator" is undoubtedly sometimes easier on the eyes.
As for describing "." as an all-pass filter --- I am wondering whether the audience who will benefit from such a description is largely the same audience who will understand https://en.wikipedia.org/wiki/All-pass_filter ?
As for describing "." as an all-pass filter --- I am wondering whether the audience who will benefit from such a description is largely the same audience who will understand https://en.wikipedia.org/wiki/All-pass_filter ?
You are rigth, but while in XSLT we say ".
is the current node", or in the shell we say ".
is the current working directory", what should I say in JQ?
The phrase ".
is the null filter" will be ok, but null is also a type name and value; this will be a source of confusion. In SNOBOL the null string is a pattern that always matches, and has the same role as the dot filter. For example, in the following code the dot filter helps to emulate SNOBOL fence
or Prolog cut
:
label $fence
| F
| (. , break $fence) # like SNOBOL fence or Prolog cut
| G
Can I say null filter? Or perhaps input value?
Have you considered "Identity filter"? It is, after all, an identity function.
I think identity filter is a great idea here, though, admittedly, I'm also in the group that would understand "all pass filter".
As for the manual, I like that the main manual ignores some of the language aspects in favor of brevity and clarity. It makes it easy to jump into using jq. However, I think a second man page that focuses on the language as a language is a great idea. I know there's a lot of work going on at stack overflow, but to use that in the manual likely requires us to gain licensing from the individual authors.
Anyways, I'd like to see us split out the manual into two parts, one on
jq(1)
(the binary, and some basic usage), and another on jq.lang(8?)
,
(the language, maybe also builtins)
On Sun, Feb 12, 2017, 07:24 Santiago Lapresta notifications@github.com wrote:
Have you considered "Identity filter"? It is, after all, an identity function https://en.wikipedia.org/wiki/Identity_function.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stedolan/jq/issues/1326#issuecomment-279215031, or mute the thread https://github.com/notifications/unsubscribe-auth/ADQ4V2P_madeyCv_Hr2GjuKQSTzIpfKvks5rbvoVgaJpZM4LwT_k .
@fadado - In explaining the identity filter, ".", it would be helpful to mention that it echoes each JSON value presented as input in turn. Indeed, in my opinion, the main area for improvement of the manual is explaining the stream-oriented aspect of jq. (See https://github.com/stedolan/jq/wiki/Advanced-Topics#streams)
I like "all pass filter" in part because of its intuitive interpretation, but it has a lot of namespace conflict, and should someone google it they'd be given a lot of misdirection.
Using "identity function" is a good choice, on par with "current working directory", and "current node". The problem is more of which analogy you want to go with. Analogizing with the shell would be the best choice IMO here, because of the synergy with explaining the pipe operator, similarity in formatting and operation.
I agree that "null filter" would probably be a source of confusion.
Using "input value" probably works, but then you also need to explain that it's largely unnecessary to provide an input value, since it's automatically interpreted/provided for you most of the time.
@lylemoffitt - Obviously "all-pass filter" is clear to some, but even for those with a signal-processing background, might not the bit about phase change be a potential source of confusion? More importantly, two of the primary meanings of "pass" are:
To come to an end: To decline one's turn to bid, draw, bet, compete, or play.
(Source: https://www.ahdictionary.com/word/search.html?q=pass)
@pkoppstein -- We are in agreement. That's pretty much what I was getting at. Though, your point about "pass" is important, too. I was thinking from a more common understanding, e.g. "all things pass through it". Either way, it's probably not a good way to go.
Drawing analogies to the shell and imperative/functional languages are probably the safest bets.
@fadado |
is more like "call". It's actually how you call functions: .foo | bar
calls bar
with .foo
as its .
. "Bind" is more appropriate for EXP as $name | ...
, since that creates a symbolic binding for the output value(s) of EXP (that is, $name
refers to each value output by EXP
, successively, but only one value at a time, and it is visible only to the expression to the right of the |
).
I do like some of the suggestions here. Certainly a table of operator precedence would be nice, and some of the "laws" that @fadado proposes would be useful to include.
I too would rather not "name everything". For now anyways.
@nicowilliams
I too would rather not "name everything". For now anyways.
I'm inclined to agree with this, as there are more important issues IMO, but the push for Stack Overflow kinda necessitates that we have common pronounceable names for all the fundamental operations in jq. The rest of the discussion about what to call them should be focused on how to explain them first, and then suggest alternative operator names only by way of analogy.
Currently, all of the functions are easily searchable, and most of the operators actually have explicit names given. But, a (perhaps) surprising number are without names. Instead, they are repeatedly referred to as "the _
operator/filter", or are given no noun at all and simply referred to by their bare symbol! This latter point is really unacceptable, and places a burden on both the manual writer and the reader. Speech is the fundamental basis of reading and understanding; if you can't pronounce a thing, then you can't leverage the language processing ability in your brain towards understanding that thing, which is effectively an inhibitor since learning is all about neural activation. To wit, what do you expect people to say when they read the symbol .[]
in the following quote from the manual?
Running
.[]
with the input[1,2,3]
will produce the numbers as three separate results, rather than as a single array.
I digress...
The following are all taken from the man page. Type denotes what noun is used with a given symbol. The suggested name attempts to find something close to what people colloquially call the given operation, while also avoiding name conflicts and providing a minimum of specificity.
Type | Current Name | Suggested Name |
---|---|---|
operator | ? |
try operator |
filter | . |
dot operator |
filter | .foo |
member operator |
syntax | .[<string>] |
index operator |
syntax | .[2] |
index operator |
syntax | .[10:15] |
slice operator |
N/A | .[] |
stream operator |
N/A | , |
comma operator |
operator | | |
pipe operator |
The unnamed
.[]?
and.foo?
are not named here and should remain so, because they are really just common applications of the now so named "try operator".
Type | Current Name | Suggested Name |
---|---|---|
operator | ? |
try operator |
special | . |
identity operator |
operator | .foo |
object identifier-index operator |
operator | ."foo" |
object index operator |
operator | .[EXP] |
array or object index operator |
operator | .[EXP:EXP] |
array and string slice operator |
operator | .[] |
array and object value iterator operator |
operator | , |
comma, or output concatenation operator |
operator | | |
pipe or call or apply operator |
syntax | EXP as $name | |
variable or symbol value binding operator |
syntax | [EXP] |
array constructor |
syntax | {EXP:EXP, ...} |
object constructor |
I'm not sure that classification as "syntax" vs. "operator" makes sense. It's all syntactic. Some of these things are "operators" in the mathematical sense, but maybe all of them are (except for .
, which can be thought of as the identity function). Even the binding syntax can be thought of as an operator, one that establishes a symbolic binding.
There's also ."string"
, and a variety of other operators. Certainly a table or two would be nice.
@nicowilliams
There's also ."string" [...]
Yup. Totally missed ."string"
, because it's not in the header.
[...] and a variety of other operators.
Which? I believe, all remaining operators have explicit names already provided in the manual. It's not super obvious, but it is there or in the context. Double checking, these are the exceptions:
==
and !=
, which I did overlook. (My bad.);
symbol, which is only mentioned directly in the wiki. It could be called a "section terminator" somewhere, i suppose. But it doesn't really need to be mentioned since it's no more an operator than than the :
is.+=
, -=
, *=
, /=
, %=
, and //=
, which I felt were adequately address by their section name and by the fact that they are all just lexical concatenations of other named operators; e.g. recurse
function "shorthand" ..
, is another one I genuinely missed. I blame that one on poor organization. It should obviously be called the "recurse operator".Certainly a table or two would be nice.
A table would be nice, but I think these names should also be put in the section labels. This is clear and consistent with the other operators that are named, like Addition and Array construction.
@lylemoffitt Well, there's also the array-collect operator ([EXP]
), the object construction operator ({<EXP>: <EXP>, ...}
).
I'm going to have to learn whether the doc system supports tables...
The ;
indeed is not an operator. It is a separator/terminator of sorts, as follows
reduce
expressionsforeach
expressions@nicowilliams
I'm going to have to learn whether the doc system supports tables...
Looks like the answer is no (see ronn-format). Maybe another form would do? You could change the entry format for building the manpage to something like:
f.puts "### #{entry['title']['symbol'] -- entry['title']['name']}\n"
And change the yaml to match:
entries:
- title:
- name: "Index Operator"
symbol: "`.[EXP]`"
body: |
You can also look up fields of an object using syntax like...
@nicowilliams
Well, there's also the array-collect operator (
[EXP]
), the object construction operator ({<EXP>: <EXP>, ...}
).
I thought those were already named well enough by context, but thanks for adding them.
Side note: The rules for what constitutes an acceptable EXP
in each of the above is different. For example if it's 3/2
, then [3/2]
is fine, even though there is no such thing as a fractional index, while { a: 3/2 }
will fail to compile (citing shell quoting issues of course).
Yes, there are places where not the full range of expressions is permitted, most notably the object constructor, for the subtle reason that it's impossible to avoid ambiguities in the grammar.
Thanks for checking doc support for tables. Adding that is going to be a low priority for me for now, unless someone offers a PR.
@nicowilliams -- If you're fine with my solution to the tables (or something like it), I can certainly put in that PR for you. I don't think we're set on the content yet, though.
@lylemoffitt Can we get a preview of what a rendered manpage would look like?
@lylemoffitt Er, actually, ronnformat does seem to support tables, since it claims that "[a]ll markdown(7) linking features are supported."
@nicowilliams
All markdown(7) linking features are supported.
That looks like they only have support for markdown's [ link text ]( link url )
and [ link text ]( #section-link )
features.
@lylemoffitt Oy, yes, I misread that. But elsewhere it says:
The ronn(1) command converts text in a simple markup to UNIX manual pages.
The syntax includes all Markdown formatting features, plus conventions for
expressing the structure and various notations present in standard UNIX manpages.
I tried it, and... no dice, ronn does not seem to support tables. https://github.com/rtomayko/ronn/issues/99
Also, whatever is done for manpages has to work for the HTML-rendered manual as well.
@lylemoffitt #1340 is a PR with some modest enhancements based on this issue and #1337.
Exigent Question:
At any point in a jq script what does the filter
.
return? It may be easy for an experienced user, but it's not clear from the documentation. Put another way: what defines an expression? What delimits scope? The answers to these questions are implied, but not explicitly or clearly stated by the documentation. It's ironic that the dot filter is referred to the "least interesting filter", because it is the key to understanding the transformation of data through the script.Problems:
The man page doesn't really say a whole lot about parenthesis. They pretty much only show up in function signatures and in examples. Yet, they have a fundamental relationship with the dot filter, and thus a critical role in the functioning the script. Their usage should be clarified. It would also be helpful to clarify their relationship with the object constructors,
[]
and{}
, as all three are used to create sub-expressions and return objects.The easy thing to do here would be to just create a section where you define
()
as an expression operator or scope operator, and then stick all the missing explanation there. This might solve the immediate issue, but you could do a lot better. I'm trying to stick with one problem here, but in general the manual could be a lot clearer. I don't know if you're trying to intentionally hide that jq is a full-blown language, but it would certainly be a lot cleaner if you approached explaining the query language like it was the pure-function programming language it is.Suggested Solutions:
Define the operator
()
as a Value Constructor and put it in the Types and Values section. It constructs a value from the output of the contained expression. The only thing that would be needed to be changed about its existing functionality in order to bring it in line with the other constructor operators is that it must also work when the expression is empty. Analogous to[]
and{}
, this should be implemented to construct anull
value.Example:
Add a section Operator Precedence and Expression Evaluation (or something to that effect) with the following:
Define how filters and operators are composed into expressions and how the expressions are applied to the input JSON string to create the output JSON string. An explicitly codified type-transform like the following (written in pseudo-Haskell) would be one way to do it and be enormously helpful in terms of reasoning about a jq script.
Define operator precedence. I know it's basically just left to right and parenthesis first, but it's important to explicitly state these things. This is where the type-transform will come in handy again, because it help elucidate why different sets of operators have different semantics. For example, constructors (like
[]
and{}
), which are called operators, are actually closures. This explains why they have totally different semantics.Define scoping rules. The effect of
()
on scope is briefly mentioned in the Variables sub-section, but never talked about directly. The relationship between constructors and scope is never mentioned at all. Discussion of the relationship between.
and the concept of scope should also be discussed. Again, closures will help here.