Open elliot-nelson opened 2 years ago
Query language on the command line is a much more complex problem than giving the developer a way to execute complex queries. For example, what if I added a selector scope query-file:
that accepts a path to a JSON file containing a serialized query?
Then you have, say (obviously we could come up with a more ergonomic schema, but this is lazy):
{
"op": "intersection",
"arguments": [
"git:origin/main",
"tag:Services"
]
}
And on the command line:
rush build --to "query-file:./changed-services.json"
Or for your last example:
{
"op": "subtract",
"arguments": [
{
"op": "to",
"arguments": [
{
"op": "intersection",
"arguments": [
"git:origin/main",
"tag:Services"
]
}
]
},
{
"op": "impacted-by",
"arguments": [
"name:B"
]
}
]
}
WIth
rush build --only "query-file:./custom-query.json"
Wrting command-line query languages is complicated; making a workable JSON query language is pretty straightforward.
🤔 Has promise! Doesn't muck up the command line, can write a schema, etc.
In an ideal world, we would follow the standard used by many tools that the file name -
represents STDIN, so that you could still write an inline query like so:
echo '{"op": "intersection", "args": ["git:repo", "tag:Service"]}' | rush build --to query-file:-
I think I'm first going to work on opening up the selector parser to extension via plugins, then supporting something like this will be an area that get be explored as a plugin.
Definitely a good idea! We have libraries and apps in our monorepo. All of those are grouped via tags:
{
"projects": [
{
"packageName": "@information/client",
"projectFolder": "apps/information",
"tags": ["app", "information"]
},
{
"packageName": "@warehouse/client-1",
"projectFolder": "apps/client-1/warehouse",
"tags": ["app", "warehouse"]
},
{
"packageName": "@warehouse/client-2",
"projectFolder": "apps/client-2/warehouse",
"tags": ["app", "warehouse"]
},
{
"packageName": "@library/information-interfaces",
"projectFolder": "packages/information-interfaces",
"tags": ["library", "information"]
},
{
"packageName": "@library/warehouse-interfaces",
"projectFolder": "packages/warehouse-interfaces",
"tags": ["library", "warehouse"]
}
]}
When we want to build all app
that also have warehouse
attached we currently would either have to use a combined/unique tag (app-warehouse
) or have special command line that calls a special package.json
script which is only available in app-warehouse
projects.
After thinking about this more, btw, I'm in favor of the option for this JSON approach, but I really think it's not that hard to let users form complex queries on the fly on the command line.
You only need the operators &
(intersect), |
(union), -
(difference), and maybe optionally !
(negate), plus a way to group expressions and to call functions. A "function" can be a selector expansion like to
or a query like tag
or git
etc.
To reformat David's example above:
rush build --select "to(git:origin/main & tag:Services) - impacted-by(name:B)"
This basically reads just like a label selector from Jenkins, for example, and is usable both by expert CI administrators and the typical developer on the command line.
Parse the expression, turn it into the backing JSON expression, error if there's any unknown selector queries or expanders, and then execute. In my opinion this is the killer missing feature for selecting projects in Rush.
My longstanding objection to these query expressions is that it would over-complicate the CLI, for example making the meaning dependent on the order in which CLI parameters appear. (I find the Unix find
syntax to be really difficult to remember, for example.)
The --select
parameter addresses these concerns. Also it moves the domain specific language (DSL) into a text string that is protected from the shell, and gives us a centralized place to document the --select
DSL. 👍
But we should keep in mind that Rush's mission is to be friendly and intuitive for newcomers. Our requirement isn't only to allow person A to select a subset of projects. We also should ensure that person B can read the expression and guess its meaning, without having to consult the Rush manual.
rush build --select "to(git:origin/main & tag:Services) - impacted-by(name:B)"
Specific concerns with the above proposal:
-
an operator? Then impacted-by
is misleading; operators normally aren't sensitive to whitespace&
means intersection, but in JavaScript it reads "and" which seems like unioning things maybeorigin/main
is a branch, but Git branch names allow special characters such as origin/a&b
. So to(git:origin/main&tag:Services)
would be ambiguousHere's one iteration of trying to solve some of those problems:
rush build --select "to([git:origin/main] plus [tag:services]) minus impacted-by([b])"
[
]
. This eliminates the need for name:
--to
-> to()
, --impacted-by
-> impacted-by()
, etc_ minus _
, _ plus _
, invert _
. Where it makes sense, words are better than symbols, because you can search for them easily. (Try googling for rush's & operator
heheh)For readability, the syntax could enforce parentheses rather than relying on order-of-operations:
# Does this mean "([a] plus [b]) minus [c]" or "[a] plus ([b] minus [c])"
# My wall doesn't have room for more order-of-operations charts 😆
rush build --select "[a] plus [b] minus [c]" # FORBIDDEN
rush build --select "([a] plus [b]) minus [c]" # CORRECT
@octogonz Good points! A couple responses:
I think the choice to avoid symbols for operators makes sense, but I would skew more towards a query language like Jira's JQL; if you choose instead the binary operators and
and or
and the unary operator not
, you have 3 very short words, and you don't even need a minus (A minus B
is just A and not B
).
For your example:
rush build --select "to([git:origin/main] and [tag:services]) and not impacted-by([b])"
I'll admit I balk at forcing parens because you turn any expression with a few operators into lisp. I think the logical choice given my operators above is NOT > AND > OR. (You also could choose to just apply all AND and OR in left-to-right fashion, but you still need NOT to immediately consume its operand --the expression [git:origin/main] and [tag:services] and not [tag:broken] or [tag:deprecated]
might ready nicely in English but is clearly missing a parens around [tag:broken] or [tag:deprecated]
.)
It sounds like your goal is preventing user-inserted strings from breaking the DSL, but I don't think it's an achievable goal. Git branches also think [, ], (, ), and even " and '
are all valid characters, so you can break out of almost anything you can envision with a simple --select git:$BRANCH_NAME
. Your best medicine against this type of shenanigan is just warning people not to put weird stuff in their branch names, imo.
That doesn't mean I'm against your initial suggestion by the way... I'll keep thinking about whether there's any alternative to [tag:services]
.
(Originally, I was going to suggest here that we just make it tag(services)
, to avoid that new syntax. But I think it's just too huge a loss to have the concept of projects tagged services expressed differently in --to
vs --select
.)
One more thought... I think in 99% of cases, the brackets could be optional. Just the "next space" would suffice for project names, tag names, most folder paths, etc., allowing very simply expressions like:
rush build --select "to(@acme/utils and @acme/services) and not @acme/build-test"
Then the []
expressions would only really be needed if your selector expression contained spaces or parens or would be confused for an operator.
Summary
As a monorepo administrator, it would be nice if I could use the new tags feature to select the intersection of two tags.
Even better, I'd like to be able to take the intersection of two different selectors, like
tag
andgit
:Details
In addition to intersecting two existing selectors, there may be other use cases that would be useful:
(By "expansion" I'm referring to the existing parameters e.g.
--from
,--to
,--impacted-by
)In a totally imaginary query language, here's some examples of use cases we could consider:
(It may be there's a clear design that doesn't allow all of the above, which would be OK -- this is just a discussion starter.)