At a high level, lage has always been a tool that should have been broken into several smaller units. These are:
A configuration parser
A target graph generator based on the configuration + the state of the workspace (monorepo workspace)
An abstraction for:
cache
execution engine
Under the hood, we have combined too many concepts into one single tool. This has made the software fragile and hard to test. We believe in the UNIX philosophy in that we should produce a set of tools or libraries that excel in achieving a single goal well. In this RFC, we define the smaller pieces in detail to help make v2 of lage ultimately achieve higher quality and higher flexibility. We believe we can leverage additional implementations of caching and execution that could bring important features into lage itself, like distributed builds, and work stealing.
Definitions of terms
target
A unit of work to be executed by an execution engine; example is a single script of a package inside the monorepo. Another example can be a task function to be executed that is part of a sharding strategy
shard
A slice of work to be done; example: a slice of the total number of tests to be run
pipeline
Concept inside lage task runner that schedules tasks to be executed in parallel
execution engine
Portion of functionality that takes a target graph and runs the appropriate command line scripts; optionally to be coordinated across multiple machines
remote cache
Storage of output assets from a target
Goals & non-goals
Goals
divide up the current functionality of lage into smaller, reusable libraries
create unit tests for each of the smaller libraries
keep E2E at the lage package
reduce API surface by removal of experimental features, like distributed workers / redis
Non-goals
creating a plugin architecture (plan for >2.0)
creating a flexible execution engine
Upstream dependencies
backfill
workspace-tools
Downstream dependencies
consumers of the tool lage
potential new dependencies as libraries
Detailed Design
Reference this graphic for a breakdown of the packages:
We will be dividing lage into these libraries:
lage
This package is how most everyone will install and use the tool. It is a thin layer on top of @lage/cli and @lage/config. It will trigger the load of the config, and use the cli libraries to initiate the commands.
@lage-run/cli
Parses command line args, provides default cli args, loads & runs commands. This interfaces from this package should drive the CLI documentation.
The yargs library provides a parser for understanding various ways the users can input as a CLI argument. However, the CLI of lage can take multiple commands which are not executed sequentially. Due this the complexity here, lage will need to take charge of the interpretation of commands and options to the graph that gets generated for the run command. For others commands, like cache and info, lage will parse and execute a single command function. Because of the flexibility needed, lage uses the low level yargs-parser library instead of yargs.
Dependencies:
@lage-run/logger
yargs-parser
@lage-run/config
Parses the configuration file, expanding on the shorthand syntax of the pipeline. There will be a portion of code that will convert the lage config into the configuration for the upstream dependencies.
Dependencies
@lage-run/logger
backfill-config
(any other config / config types packages)
@lage-run/pipeline
Pipeline Class
Its job is allow the caller to add tasks listing its dependencies. It will use the package dependency graph to determine the exact target nodes to be created.
The runner should be called from the top level command for run. This runner that is included with lage by default should be one that is optimized for a single machine with multiple cores. We will port the current runner (which uses p-graph) to the @lage-run/runner.
This runner converts the pipeline into a run graph suitable for the p-graph runner.
Likely in a future update (another RFC), we will move to a work stealing algorithm - where the idle resources will be taken up as available.
Dependencies
p-graph
p-profiler
@lage-run/logger
@lage-run/logger
This is a flexible logger. Currently the loggers are "sinks" that take in some well known structures of information (from tasks, or from lage). The CLI or config provides instruction as to how to report the logs as they're being recorded (streamed / buffered).
Test Plan
All of the new @lage-run/* are to be highly covered with unit tests (we should aim for near 100% code coverage)
We will rely on the existing E2E tests to provide behavioral coverage as exists today at the lage package
Performance, Resilience, Monitoring
No performance degradation is expected here. Since we are dividing the codebase into smaller pieces, we are also expecting the codebase to be much easily testable. This should then help with quality. As far as resilience / telemetry monitoring, we do not collect these for privacy.
We may in the future provide a way for users to easily report issues with lage, not in this RFC. This kind of mechanism can include a user directed submission of lage logs, etc.
Security & Privacy
No, we are not changing dependencies on external packages, and we are not collecting or exposing any data about the users' repositories.
Accessibility
No impacts
Note: lage's output is text in a terminal. Its output is not currently very friendly to any screen readers. However, we may want to publish a version of lage that will be more inclusive via different frontend other than a CLI
World Readiness
Nothing for v2.
Execution Plan
Below lists out where work will be done:
tag
version
branch
latest
1.x, @lage-run/*
master
stable
0.x
stable
We will be using the monorepo to help us start creating new @lage-run/* packages. We will start by creating all necessary libraries - and also a lage2 binary to the existing lage package to test out the functionality while lage 1.x is still the latest.
lage v2 RFC
Overview
At a high level,
lage
has always been a tool that should have been broken into several smaller units. These are:Under the hood, we have combined too many concepts into one single tool. This has made the software fragile and hard to test. We believe in the UNIX philosophy in that we should produce a set of tools or libraries that excel in achieving a single goal well. In this RFC, we define the smaller pieces in detail to help make v2 of
lage
ultimately achieve higher quality and higher flexibility. We believe we can leverage additional implementations of caching and execution that could bring important features intolage
itself, like distributed builds, and work stealing.Definitions of terms
Goals & non-goals
Goals
lage
packageNon-goals
Upstream dependencies
Downstream dependencies
Detailed Design
Reference this graphic for a breakdown of the packages:
We will be dividing
lage
into these libraries:lage
This package is how most everyone will install and use the tool. It is a thin layer on top of
@lage/cli
and@lage/config
. It will trigger the load of the config, and use the cli libraries to initiate the commands.@lage-run/cli
Parses command line args, provides default cli args, loads & runs commands. This interfaces from this package should drive the CLI documentation.
The
yargs
library provides a parser for understanding various ways the users can input as a CLI argument. However, the CLI of lage can take multiple commands which are not executed sequentially. Due this the complexity here, lage will need to take charge of the interpretation of commands and options to the graph that gets generated for therun
command. For others commands, likecache
andinfo
, lage will parse and execute a single command function. Because of the flexibility needed, lage uses the low levelyargs-parser
library instead ofyargs
.Dependencies:
yargs-parser
@lage-run/config
Parses the configuration file, expanding on the shorthand syntax of the pipeline. There will be a portion of code that will convert the lage config into the configuration for the upstream dependencies.
Dependencies
backfill-config
@lage-run/pipeline
Pipeline
ClassIts job is allow the caller to add tasks listing its dependencies. It will use the package dependency graph to determine the exact target nodes to be created.
ctor dependencies:
workspaces
methods:addTargetDefinition(id: string, definition: TargetDefinition | TargetDefinitionFactory)
TargetDefinition
Interface (formerly TargetConfig)TargetDefinitionFactory
interface (formerly TargetConfigFactory)Dependencies
@lage-run/runner
The runner should be called from the top level command for
run
. This runner that is included with lage by default should be one that is optimized for a single machine with multiple cores. We will port the current runner (which uses p-graph) to the @lage-run/runner.This runner converts the pipeline into a run graph suitable for the
p-graph
runner.Dependencies
@lage-run/logger
This is a flexible logger. Currently the loggers are "sinks" that take in some well known structures of information (from tasks, or from lage). The CLI or config provides instruction as to how to report the logs as they're being recorded (streamed / buffered).
Test Plan
@lage-run/*
are to be highly covered with unit tests (we should aim for near 100% code coverage)lage
packagePerformance, Resilience, Monitoring
No performance degradation is expected here. Since we are dividing the codebase into smaller pieces, we are also expecting the codebase to be much easily testable. This should then help with quality. As far as resilience / telemetry monitoring, we do not collect these for privacy.
Security & Privacy
No, we are not changing dependencies on external packages, and we are not collecting or exposing any data about the users' repositories.
Accessibility
No impacts
World Readiness
Nothing for v2.
Execution Plan
Below lists out where work will be done:
We will be using the monorepo to help us start creating new
@lage-run/*
packages. We will start by creating all necessary libraries - and also alage2
binary to the existinglage
package to test out the functionality whilelage
1.x is still the latest.