[rush] Select intersection of matching tags (query selectors)

elliot-nelson commented 2 years ago

Summary

As a monorepo administrator, it would be nice if I could use the new tags feature to select the intersection of two tags.

# This builds to "all projects tagged Apps and all projects tagged TeamA"
rush build --to tag:Apps --to tag:TeamA

# What if I want to build "all projects tagged Apps and TeamA"?
rush build --to ???

Even better, I'd like to be able to take the intersection of two different selectors, like tag and git:

# This builds to "all projects changed since main, AND, all projects tagged Services"
rush build --to git:origin/main --to tag:Services

# What if I want to build "all projects tagged Services changed since main"?
rush build --to ???

Details

In addition to intersecting two existing selectors, there may be other use cases that would be useful:

Taking one set of projects, and subtracting another set of projects.
Expanding a set of projects before acting on it (i.e. selecting its dependencies or consumers).
Acting on an expanded set of projects AFTER expansion.

(By "expansion" I'm referring to the existing parameters e.g. --from, --to, --impacted-by)

In a totally imaginary query language, here's some examples of use cases we could consider:

# Take all projects changed since main, with tag Services, and expand it "--to"
# Then cut individual project B out of that tree.
# This is an "unsafe" query (there's a hole in the middle of it).
rush build to(git:origin/main && tag(Services))-project:B

# Take all projects changed since main, with tag Services, and expand it "--to"
# Then cut individual project B out of that tree along with all of its consumers.
# This is a safe version of the above (no holes).
rush build to(git:origin/main && tag(Services))-impactedby(project:B)

(It may be there's a clear design that doesn't allow all of the above, which would be OK -- this is just a discussion starter.)

dmichon-msft commented 2 years ago

Query language on the command line is a much more complex problem than giving the developer a way to execute complex queries. For example, what if I added a selector scope query-file: that accepts a path to a JSON file containing a serialized query?

Then you have, say (obviously we could come up with a more ergonomic schema, but this is lazy):

changed-services.json

{
    "op": "intersection",
    "arguments": [
        "git:origin/main",
        "tag:Services"
    ]
}

And on the command line: rush build --to "query-file:./changed-services.json"

Or for your last example:

custom-query.json

{
    "op": "subtract",
    "arguments": [
        {
            "op": "to",
            "arguments": [
                {
                    "op": "intersection",
                    "arguments": [
                        "git:origin/main",
                        "tag:Services"
                    ]
                }
            ]
        },
        {
            "op": "impacted-by",
            "arguments": [
                "name:B"
            ]
        }
    ]
}

WIth rush build --only "query-file:./custom-query.json"

Wrting command-line query languages is complicated; making a workable JSON query language is pretty straightforward.

elliot-nelson commented 2 years ago

🤔 Has promise! Doesn't muck up the command line, can write a schema, etc.

In an ideal world, we would follow the standard used by many tools that the file name - represents STDIN, so that you could still write an inline query like so:

echo '{"op": "intersection", "args": ["git:repo", "tag:Service"]}' | rush build --to query-file:-

dmichon-msft commented 2 years ago

I think I'm first going to work on opening up the selector parser to extension via plugins, then supporting something like this will be an area that get be explored as a plugin.

buffcode commented 2 years ago

Definitely a good idea! We have libraries and apps in our monorepo. All of those are grouped via tags:

{
"projects": [
    {
      "packageName": "@information/client",
      "projectFolder": "apps/information",
      "tags": ["app", "information"]
    },
    {
      "packageName": "@warehouse/client-1",
      "projectFolder": "apps/client-1/warehouse",
      "tags": ["app", "warehouse"]
    },
    {
      "packageName": "@warehouse/client-2",
      "projectFolder": "apps/client-2/warehouse",
      "tags": ["app", "warehouse"]
    },

    {
      "packageName": "@library/information-interfaces",
      "projectFolder": "packages/information-interfaces",
      "tags": ["library", "information"]
    },
    {
      "packageName": "@library/warehouse-interfaces",
      "projectFolder": "packages/warehouse-interfaces",
      "tags": ["library", "warehouse"]
    }
]}

When we want to build all app that also have warehouse attached we currently would either have to use a combined/unique tag (app-warehouse) or have special command line that calls a special package.json script which is only available in app-warehouse projects.

elliot-nelson commented 2 years ago

After thinking about this more, btw, I'm in favor of the option for this JSON approach, but I really think it's not that hard to let users form complex queries on the fly on the command line.

You only need the operators & (intersect), | (union), - (difference), and maybe optionally ! (negate), plus a way to group expressions and to call functions. A "function" can be a selector expansion like to or a query like tag or git etc.

To reformat David's example above:

rush build --select "to(git:origin/main & tag:Services) - impacted-by(name:B)"

This basically reads just like a label selector from Jenkins, for example, and is usable both by expert CI administrators and the typical developer on the command line.

Parse the expression, turn it into the backing JSON expression, error if there's any unknown selector queries or expanders, and then execute. In my opinion this is the killer missing feature for selecting projects in Rush.

octogonz commented 2 years ago

My longstanding objection to these query expressions is that it would over-complicate the CLI, for example making the meaning dependent on the order in which CLI parameters appear. (I find the Unix find syntax to be really difficult to remember, for example.)

The --select parameter addresses these concerns. Also it moves the domain specific language (DSL) into a text string that is protected from the shell, and gives us a centralized place to document the --select DSL. 👍

But we should keep in mind that Rush's mission is to be friendly and intuitive for newcomers. Our requirement isn't only to allow person A to select a subset of projects. We also should ensure that person B can read the expression and guess its meaning, without having to consult the Rush manual.

rush build --select "to(git:origin/main & tag:Services) - impacted-by(name:B)"

Specific concerns with the above proposal:

Is - an operator? Then impacted-by is misleading; operators normally aren't sensitive to whitespace
You said & means intersection, but in JavaScript it reads "and" which seems like unioning things maybe
origin/main is a branch, but Git branch names allow special characters such as origin/a&b. So to(git:origin/main&tag:Services) would be ambiguous

Here's one iteration of trying to solve some of those problems:

rush build --select "to([git:origin/main] plus [tag:services]) minus impacted-by([b])"

Rush CLI selectors always go inside [ ]. This eliminates the need for name:
the Rush CLI parameters map to function-like DSL operators: --to -> to(), --impacted-by -> impacted-by(), etc
the joining is done using prefix/infix operators. _ minus _, _ plus _, invert _. Where it makes sense, words are better than symbols, because you can search for them easily. (Try googling for rush's & operator heheh)

For readability, the syntax could enforce parentheses rather than relying on order-of-operations:

# Does this mean "([a] plus [b]) minus [c]" or "[a] plus ([b] minus [c])"
# My wall doesn't have room for more order-of-operations charts 😆
rush build --select "[a] plus [b] minus [c]"  # FORBIDDEN

rush build --select "([a] plus [b]) minus [c]"  # CORRECT

elliot-nelson commented 2 years ago

@octogonz Good points! A couple responses:

Operators

I think the choice to avoid symbols for operators makes sense, but I would skew more towards a query language like Jira's JQL; if you choose instead the binary operators and and or and the unary operator not, you have 3 very short words, and you don't even need a minus (A minus B is just A and not B).

For your example:

rush build --select "to([git:origin/main] and [tag:services]) and not impacted-by([b])"

Operator precedence

I'll admit I balk at forcing parens because you turn any expression with a few operators into lisp. I think the logical choice given my operators above is NOT > AND > OR. (You also could choose to just apply all AND and OR in left-to-right fashion, but you still need NOT to immediately consume its operand --the expression [git:origin/main] and [tag:services] and not [tag:broken] or [tag:deprecated] might ready nicely in English but is clearly missing a parens around [tag:broken] or [tag:deprecated].)

CLI Selectors

It sounds like your goal is preventing user-inserted strings from breaking the DSL, but I don't think it's an achievable goal. Git branches also think [, ], (, ), and even " and ' are all valid characters, so you can break out of almost anything you can envision with a simple --select git:$BRANCH_NAME. Your best medicine against this type of shenanigan is just warning people not to put weird stuff in their branch names, imo.

That doesn't mean I'm against your initial suggestion by the way... I'll keep thinking about whether there's any alternative to [tag:services].

(Originally, I was going to suggest here that we just make it tag(services), to avoid that new syntax. But I think it's just too huge a loss to have the concept of projects tagged services expressed differently in --to vs --select.)

elliot-nelson commented 2 years ago

One more thought... I think in 99% of cases, the brackets could be optional. Just the "next space" would suffice for project names, tag names, most folder paths, etc., allowing very simply expressions like:

rush build --select "to(@acme/utils and @acme/services) and not @acme/build-test"

Then the [] expressions would only really be needed if your selector expression contained spaces or parens or would be confused for an operator.

microsoft / rushstack