[Feature] workspace querying command useful for a CI target determinator

sargunv commented 3 years ago

[x] I'd be willing to implement this feature (contributing guide)
[ ] This feature is important to have in this repository; a contrib plugin wouldn't do
- a contrib plugin would do just fine, but it seems like a good fit for the workspace tools plugin in this repo

Describe the user story

I have a monorepo with many separate services and interdependent library packages. I'd like my CI to run a "target determinator" on each PR that identifies which packages changed, and run build/test steps on only those packages and their dependent packages.

The yarn workspaces list gives me the path and dependency graph info I need to implement that target determinator myself, as I did in this custom GitHub Action: https://github.com/sargunv/yarn-target-determinator/. However, I feel the core ingredients necessary for this kind of functionality are a good fit for the workspace-tools plugin. Specifically, those core ingredients are:

Given an array of file paths, ask Yarn which workspaces those files are in
Given an array of workspaces names, ask Yarn what all the recursive dependent workspaces are of those workspaces

Describe the solution you'd like

I'm imagining a yarn workspaces lookup command in plugin-workspace-tools with the following behavior:

In its most basic form, this command takes a list of workspace names and echoes the list of workspaces names, excluding duplicate workspaces, and erroring on invalid ones:

$ yarn workspaces lookup @example/lib-a @example/lib-a @example/lib-b
➤ YN0000: @example/lib-a
➤ YN0000: @example/lib-b
➤ YN0000: Done in 0s 15ms

Of course, that alone isn't all that useful. Now, we add a --files flag that takes an array of file paths, and adds them to the list of input workspace names that then behave like the above.

$ yarn workspaces lookup --files package.json packages/lib-a/src/index.ts packages/lib-a/src/example.ts
➤ YN0000: @example/monorepo
➤ YN0000: @example/lib-a
➤ YN0000: Done in 0s 15ms

Now, we add a --dependents flag that takes the input list of workspaces, and additionally includes their recursive dependent workspaces. This should work well together with the --files flag to get us the desired target determinator logic.

$ yarn workspaces lookup @example/lib-b --dependents
➤ YN0000: @example/lib-b
➤ YN0000: @example/service-b
➤ YN0000: Done in 0s 15ms

$ yarn workspaces lookup @example/lib-a --dependents
➤ YN0000: @example/lib-a
➤ YN0000: @example/lib-b
➤ YN0000: @example/service-a
➤ YN0000: @example/service-b
➤ YN0000: Done in 0s 15ms

$ yarn workspaces lookup --files packages/lib-a/src/index.ts --dependents
➤ YN0000: @example/lib-a
➤ YN0000: @example/lib-b
➤ YN0000: @example/service-a
➤ YN0000: @example/service-b
➤ YN0000: Done in 0s 15ms

Similarly, a --dependencies flag that does the same, but for the recursive dependencies. I don't personally need this one, but it's here for completeness.

$ yarn workspaces lookup packages/lib-b/src/index.ts --dependencies
➤ YN0000: @example/lib-b
➤ YN0000: @example/lib-a
➤ YN0000: Done in 0s 15ms

And finally, a --raw flag to change the output format to make it easy to consume programmatically. I'd also be okay with --json for an NDJSON stream here, but this output isn't structured so that'd just be a stream of JSON strings, unless we want to output objects with more metadata on each workspace.

$ yarn workspaces lookup --raw --files packages/lib-a/src/index.ts --dependents
@example/lib-a
@example/lib-b
@example/service-a
@example/service-b

Example use cases:

The target determinator mentioned above, pretending we're in GitHub Actions:

# $CHANGESET is a newline delimited list of files that changed in this PR
# $IGNORESET is a newline delimited list of files the target determinator ignores (readmes and the like).
FILES=$(comm <(echo $IGNORESET) <(echo $CHANGESET))
TARGETS=$(yarn workspaces lookup --raw --files $FILES --dependents)
TARGETS_JSON=$(echo $TARGETS | jq -cRs 'split("\n") | map(select(length > 0))')
echo "::set-output name=targets::$TARGETS_JSON" # generates a matrix for the build job

Working on a library in a monorepo, running tests in every workspace that depends on it:

yarn workspaces foreach --include $(yarn workspaces lookup --raw @example/lib-a) run test

Describe the drawbacks of your solution

I haven't considered how to handle nested worktrees within worktrees, mainly because I haven't used them and therefore aren't familiar with dependency semantics with those repos. Is it valid to assume that won't be an issue here?

Nor have I considered how to handle packages that aren't registered in the worktree. I imagine it's fine to error on this case, just like running Yarn errors in a "fixtures" directory in this repo?

The output I've described is totally flat rather than structured into a dependency tree. This is an intentional choice to keep things simple, but it's possible there are valid use cases I haven't considered that would prefer to retain that tree info.

I'm designing for my own use case in mind, so it's possible I've overfit for that use case (CI target determinator) and it won't be super useful for other use cases. However, these feel like fairly fundamental building blocks that I'd be surprised if other use cases didn't emerge.

Describe alternatives you've considered

This also could be implemented by parsing the output of yarn workspaces list into a graph and then operating on that graph. Actually, I've already done that in https://github.com/sargunv/yarn-target-determinator/ to unblock my immediate use case. I just feel it would be more elegantly implemented in a plugin.

It doesn't need to be here in plugin-workspace-tools, I could just write and release it myself as a third party plugin. However, I feel that it's a good fit for the workspace tools plugin, so decided to propose it here and see if it's something y'all would accept before I work on an independent plugin.

arcanis commented 3 years ago

That sounds like a good idea, although I'd tend to prefer exposing each data result as a separate command (rather than options). I'd probably go with an interface like this:

# Prints the workspaces owning those files; in json mode, the output is a tuple [file, workspace]
yarn workspaces lookup [--json] [...files]

# Prints the list of changed workspaces that are currently modified (--files also print the files themselves)
yarn workspaces changeset [--json] [--files]

# Prints the list of workspaces that would satisfy the given set of workspaces; by default the current one
yarn workspaces resolve-dependents [--json] [--topological] [--topological-dev] [--all] [...workspaces]

To make it even better it'd be nice if those tools were also available as programmatic helpers which other plugins could leverage.

sargunv commented 3 years ago

Sure, that interface seems reasonable. I'd be happy to implement lookup and resolve-dependents at least; might leave changeset for someone else as I don't personally have a use case for it atm. I use another GH Action to pull the changeset for a PR via GH's API.

Curious, what would --topological and --topological-dev do in resolve-dependents? I understand how they're used for ordering/parallelism in foreach but not sure how that'd apply to just printing a list of dependents. Would it just modify the output order?

Ditto for --all.

arcanis commented 3 years ago

Would it just modify the output order?

Yep exactly - although perhaps that could be a different yarn workspaces sort command 🤔

might leave changeset for someone else as I don't personally have a use case for it atm. I use another GH Action to pull the changeset for a PR via GH's API.

Sounds good! If you need it sometime it's actually already implemented here, we'd just need to move it inside the workspace tools (and remove the mercurial driver thing, since we have no need for it anymore).

mzhubail commented 2 months ago

@arcanis Any updates on this?

Since last update the options --since and --recursive has been added for yarn workspaces list and yarn workspaces foreach which covers part of the functionality mentioned here.

If there's any additions I'd be happy to make a PR. I already took a look around in the codebase before I found #3459.

yarnpkg / berry

[Feature] workspace querying command useful for a CI target determinator #2535