wtlin1228 / boar-hat

Learning in Boar Hat with Hawk 🐽
1 stars 0 forks source link

Monorepo tools deep diving #7

Closed wtlin1228 closed 1 week ago

wtlin1228 commented 3 weeks ago

Always want to lean more about monorepo tools like Nx and Turborepo. Let's do it!

  1. Cache & Remote Cache
  2. Task Graph & Scheduling
  3. Package Graph
wtlin1228 commented 2 weeks ago

Package Graph in Turbo

1. get workspace glob

packages:
  - "apps/*"
  - "packages/*"

2. walk through the directory and get all packages.json

3. Add each app and package into the graph

Graph {
    Ty: "Directed",
    node_count: 5,
    edge_count: 1,
    edges: (1, 0),
    node weights: {
        0: Root,
        1: Workspace(
            Root,
        ),
        2: Workspace(
            Other(
                "@cytm/typescript-config",
            ),
        ),
        3: Workspace(
            Other(
                "hawk",
            ),
        ),
        4: Workspace(
            Other(
                "kirby",
            ),
        ),
    },
}

4. connect internal dependencies

Graph {
    Ty: "Directed",
    node_count: 5,
    edge_count: 5,
    edges: (1, 0), (4, 2), (2, 0), (1, 0), (3, 2),
    node weights: {
        0: Root,
        1: Workspace(
            Root,
        ),
        2: Workspace(
            Other(
                "@cytm/typescript-config",
            ),
        ),
        3: Workspace(
            Other(
                "hawk",
            ),
        ),
        4: Workspace(
            Other(
                "kirby",
            ),
        ),
    },
}
flowchart TD
    root[Workspace Root] --> Root 
    kirby --> tsconfig[@cytm/typescript-config]
    hawk --> tsconfig
    tsconfig --> Root
wtlin1228 commented 2 weeks ago

Task Graph in Turbo:

When we run turbo run build

1. Find all the tasks

Given:

Add tasks to traversal_queue using the following rules:

  1. try to find the build task defined in each package's turbo.json
  2. if not, fallback to the root workspace's turbo.json and try to find the <package_name>:build task
  3. if the root workspace's turbo.json has defined the build task, then <package_name>:build

So far, our traversal_queue now has kirby:build, hawk:build and @cytm/typescript-config:build.

2. Construct the task graph

traverse the traversal_queue, find the package dependency using Package Graph constructed earlier.

flowchart TD
    root[Workspace Root] --> Root 
    kirby --> tsconfig[@cytm/typescript-config]
    hawk --> tsconfig
    tsconfig --> Root

The task graph:

Graph {
    Ty: "Directed",
    node_count: 4,
    edge_count: 3,
    edges: (1, 2), (3, 2), (2, 0),
    node weights: {
        0: Root,
        1: Task(
            TaskId {
                package: "kirby",
                task: "build",
            },
        ),
        2: Task(
            TaskId {
                package: "@cytm/typescript-config",
                task: "build",
            },
        ),
        3: Task(
            TaskId {
                package: "hawk",
                task: "build",
            },
        ),
    },
}
flowchart TD
    kirby[kirby:build] --> tsconfig[@cytm/typescript-config:build]
    hawk[hawk:build] --> tsconfig
    tsconfig --> Root
wtlin1228 commented 1 week ago

Scheduling in Turbo:

Use Waker and Visitor for scheduling tasks based on the topological order.


Survey:

  1. $ turbo run build
  2. cli::run()
  3. command::run(base: CommandBase, telemetry: CommandEventBuilder)
    1. RunBuilder::new(base: CommandBase)
      • create a process manager for spawning and managing child processes
    2. RunBuilder::build(signal_handler: &SignalHandler, telemetry: CommandEventBuilder)
      • build package dependency graph
      • build engine
      • create run_cache
    3. Run::run(ui_sender: Option<UISender>, is_watch: bool)
      • calculate file hashes PackageInputsHashes::calculate_file_hashes(...)
        • use something like git hash-object <file path> to hash each file
        • then use twox_hash::XxHash64 to get the package hash
      • create visitor
      • visit visitor
        • create a mpsc channel and give the sender directly to engine.execute(...)
        • Engine::execute(options: ExecutionOptions, visitor: mpsc::Sender<Message<VisitorData, VisitorResult>>
          • use Walker to walk the graph in topological order and send each task to Visitor::visit
        • create a factory ExecContextFactory
        • await on the node_stream (receive side of the mpsc channel)
          • read the task cache
          • create a exec_context using factory
          • spawn a async thread to do exec_context.execute()
wtlin1228 commented 1 week ago

Cache in Turbo:

Use git's hash and xxhash64