microsoft / lage

Task runner in JS monorepos
https://microsoft.github.io/lage
MIT License
714 stars 70 forks source link

[RFC]: Lage v2: breaking apart the builder #254

Open kenotron opened 2 years ago

kenotron commented 2 years ago

lage v2 RFC

Overview

At a high level, lage has always been a tool that should have been broken into several smaller units. These are:

  1. A configuration parser
  2. A target graph generator based on the configuration + the state of the workspace (monorepo workspace)
  3. An abstraction for:
    1. cache
    2. execution engine

Under the hood, we have combined too many concepts into one single tool. This has made the software fragile and hard to test. We believe in the UNIX philosophy in that we should produce a set of tools or libraries that excel in achieving a single goal well. In this RFC, we define the smaller pieces in detail to help make v2 of lage ultimately achieve higher quality and higher flexibility. We believe we can leverage additional implementations of caching and execution that could bring important features into lage itself, like distributed builds, and work stealing.

Definitions of terms

target
A unit of work to be executed by an execution engine; example is a single script of a package inside the monorepo. Another example can be a task function to be executed that is part of a sharding strategy
shard
A slice of work to be done; example: a slice of the total number of tests to be run
pipeline
Concept inside lage task runner that schedules tasks to be executed in parallel
execution engine
Portion of functionality that takes a target graph and runs the appropriate command line scripts; optionally to be coordinated across multiple machines
remote cache
Storage of output assets from a target

Goals & non-goals

Goals

  1. divide up the current functionality of lage into smaller, reusable libraries
  2. create unit tests for each of the smaller libraries
  3. keep E2E at the lage package
  4. reduce API surface by removal of experimental features, like distributed workers / redis

Non-goals

  1. creating a plugin architecture (plan for >2.0)
  2. creating a flexible execution engine

Upstream dependencies

  1. backfill
  2. workspace-tools

Downstream dependencies

  1. consumers of the tool lage
  2. potential new dependencies as libraries

Detailed Design

Reference this graphic for a breakdown of the packages:

image

We will be dividing lage into these libraries:

lage

This package is how most everyone will install and use the tool. It is a thin layer on top of @lage/cli and @lage/config. It will trigger the load of the config, and use the cli libraries to initiate the commands.

@lage-run/cli

Parses command line args, provides default cli args, loads & runs commands. This interfaces from this package should drive the CLI documentation.

The yargs library provides a parser for understanding various ways the users can input as a CLI argument. However, the CLI of lage can take multiple commands which are not executed sequentially. Due this the complexity here, lage will need to take charge of the interpretation of commands and options to the graph that gets generated for the run command. For others commands, like cache and info, lage will parse and execute a single command function. Because of the flexibility needed, lage uses the low level yargs-parser library instead of yargs.

Dependencies:

@lage-run/config

Parses the configuration file, expanding on the shorthand syntax of the pipeline. There will be a portion of code that will convert the lage config into the configuration for the upstream dependencies.

Dependencies

@lage-run/pipeline

Pipeline Class

Its job is allow the caller to add tasks listing its dependencies. It will use the package dependency graph to determine the exact target nodes to be created.

ctor dependencies: workspaces methods: addTargetDefinition(id: string, definition: TargetDefinition | TargetDefinitionFactory)

TargetDefinition Interface (formerly TargetConfig)

interface TargetDefinition  {
  type?: "package" | "global";
  run?: (args: TaskArgs) => Promise<boolean> | void;
  deps?: string[];
  outputs?: string[];
  priority?: number;
  cache?: boolean;
  options?: any;
}

TargetDefinitionFactory interface (formerly TargetConfigFactory)

export interface TargetConfigFactory {
  (args: FactoryArgs): TargetConfig | TargetConfig[];
}

Dependencies

@lage-run/runner

The runner should be called from the top level command for run. This runner that is included with lage by default should be one that is optimized for a single machine with multiple cores. We will port the current runner (which uses p-graph) to the @lage-run/runner.

This runner converts the pipeline into a run graph suitable for the p-graph runner.

Likely in a future update (another RFC), we will move to a work stealing algorithm - where the idle resources will be taken up as available.

Dependencies

@lage-run/logger

This is a flexible logger. Currently the loggers are "sinks" that take in some well known structures of information (from tasks, or from lage). The CLI or config provides instruction as to how to report the logs as they're being recorded (streamed / buffered).

Test Plan

Performance, Resilience, Monitoring

No performance degradation is expected here. Since we are dividing the codebase into smaller pieces, we are also expecting the codebase to be much easily testable. This should then help with quality. As far as resilience / telemetry monitoring, we do not collect these for privacy.

We may in the future provide a way for users to easily report issues with lage, not in this RFC. This kind of mechanism can include a user directed submission of lage logs, etc.

Security & Privacy

No, we are not changing dependencies on external packages, and we are not collecting or exposing any data about the users' repositories.

Accessibility

No impacts

Note: lage's output is text in a terminal. Its output is not currently very friendly to any screen readers. However, we may want to publish a version of lage that will be more inclusive via different frontend other than a CLI

World Readiness

Nothing for v2.

Execution Plan

Below lists out where work will be done:

tag version branch
latest 1.x, @lage-run/* master
stable 0.x stable

We will be using the monorepo to help us start creating new @lage-run/* packages. We will start by creating all necessary libraries - and also a lage2 binary to the existing lage package to test out the functionality while lage 1.x is still the latest.

kenotron commented 2 years ago

Tagging some people who might be interested: @bweggersen, @VincentBailly, @jcreamer898

VincentBailly commented 2 years ago

@kenotron I love the plan and the transparency about it. It looks great to me