vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26.22k stars 1.81k forks source link

Ability to configure concurrency in turbo.json #1863

Closed osdiab closed 8 months ago

osdiab commented 2 years ago

Describe the feature you'd like to request

I have a single command that doesn't work if I run it in parallel, but I would like to still have other pipelines depend on it

Describe the solution you'd like

In turbo.json, allow me to set a default concurrency value for a given pipeline.

Describe alternatives you've considered

I am just running the command manually for now in a package.json script, but that's not very elegant.

TURBO-1139

mehulkar commented 2 years ago

In turbo.json, allow me to set a default concurrency value for a given pipeline.

Guessing you mean for a given task here?

There is a --concurrency flag that forces tasks to run serially, but I think it's all or nothing https://turborepo.org/docs/reference/command-line-reference#--concurrency

osdiab commented 2 years ago

I know there's a concurrency flag when you run the CLI, but what I'm saying is that one task (is that the word?) literally never works when I run it concurrently. I do understand that turbo's main benefit is to run things concurrently, but it also is beneficial for expressing dependencies so it would simplify my dev setup if I could just always tell turbo to run a particular task serially. I mean like this:

  "pipeline": {
    "setup": {
      "cache": false,
      "concurrency": 1
    }
    "dev": {
      "dependsOn": ["setup"],
      "outputs": []
    }
  }
scamden commented 1 year ago

same here. our use case is tests that mess with a database and unfortunately have to run serially atm

jrr commented 1 year ago

Similarly to @scamden , I'm trying to express that a couple workspaces' test commands ought not to run simultaneously, in order to avoid database contention.

I have another idea for how it could work. Maybe something like this?:

"mutuallyExclusive": ["foo#test", "bar#test"]

I'd still like to crank up concurrency globally, and have a bunch of other test suites running simultaneously. I just need to carve out specific exceptions.

mehulkar commented 1 year ago

These are both good suggestions! I will bring it up with the team. Thank you for the use case!

mehulkar commented 1 year ago

Hey @osdiab we talked about this at our team meeting today and the general thought is that it's an interesting feature and we may consider this at some point, but it's not a high priority right now. Some workarounds you may consider to make this work for you:

scamden commented 1 year ago

I was hoping to use turbo as the top level cli but without this feature we are forced to run some tasks via turbo and some via pnpm (since running something with root package.json produces unreadable multiply nested output without colors). Really feels like turbo.json should be a declarative source of truth for the tasks across the monorepo. Really disappointing to hear this isn't a priority.

mehulkar commented 1 year ago

@scamden thanks for flagging the need, we're definitely tracking this and are taking it into consideration.

since running something with root package.json produces unreadable multiply nested output without colors

We're also looking at this in https://github.com/vercel/turbo/issues/219, for what it's worth. It may not solve the underlying problem of task running that this Issue is about, but if it helps your use case, it's closer on our roadmap (currently scheduled for 1.10 release)

robaca commented 1 year ago

It would also be great to use indirect measures for concurrency, like available resources (cpu, memory, custom resources).

For example, we have tasks that are very lightweight, but others that need Gigabytes of memory. Running all these tasks with the same concurrency level would either lead to idling runner instances or to memory issues.

Just looking at the concurrency limit per task is not enough then, because there is a mixture of different tasks running at the same time, and while the next heavyweight task might not be able to run at a point in time, a smaller task of another workspace might do it.

One possible solution could be specifying resource usage per task and something like selectable resource profiles for the different environments.

ruettenm commented 1 year ago

It would also be great to use indirect measures for concurrency, like available resources (cpu, memory, custom resources).

For example, we have tasks that are very lightweight, but others that need Gigabytes of memory. Running all these tasks with the same concurrency level would either lead to idling runner instances or to memory issues.

Just looking at the concurrency limit per task is not enough then, because there is a mixture of different tasks running at the same time, and while the next heavyweight task might not be able to run at a point in time, a smaller task of another workspace might do it.

One possible solution could be specifying resource usage per task and something like selectable resource profiles for the different environments.

I just had a chat with Carsten. Not sure if this would be possible. But what would be really advanced and great if turbo would over the time measure the resources of all tasks (and save the information in the remote cache?!). With this information turbo could optimize the execution of all the tasks and e.g. not execute very resource heavy tasks in parallel ;-)

Irvenae commented 1 year ago

+1 for one of the last two options. Now we cannot use TurboRepo optimally i.e. run everything in 1 command. We need to split up in multiple invocations which filters out a specific package because it uses too much memory. Also, different jobs could use different memory and so we need to call turbo multiple times to do this more efficiently.

This is really a shame...

alvarlagerlof commented 1 year ago

Even a global key in a root config (same level as globalEnv) would do for us.

trubo.json

{
  ... pipeline, globalEnv ...
  "concurrency": 12
}
NullVoxPopuli commented 1 year ago

With monorepos with 3x tasks as available CPUs, turbo running "lint" across said monorepo can cause a laptop to get a smidge sluggish for a few seconds

I only have 8 cores on this laptop :sweat_smile:

image

this ends up leaving 0 available cores for the operating system

pkerschbaum commented 1 year ago

At @hokify we solved this by creating a wrapper for the turbo CLI which we use everywhere, this is quite easy to do in a pnpm monorepo.
Here I created the wrapper in one of my OSS projects,: https://github.com/pkerschbaum/pkerschbaum-homepage/commit/0815c80bb4fa7e62ca2a55e2ce1928b2b8e2fdcc

The most important part is the wrapper .mjs file itself:

#!/usr/bin/env node
/**
 * this module is a simple wrapper for "turbo" which
 * - if no explicit "concurrency" is given, sets a default of 100% (to utilize all logical processors, see https://turbo.build/repo/docs/reference/command-line-reference/run#--concurrency)
 * - and sets some default CLI arguments (e.g. "--env-mode=strict")
 */
import { spawn } from 'node:child_process';
import { argv } from 'node:process';

const [_execPath, _jsFilePath, ...commandLineArguments] = argv;

commandLineArguments.push(
  '--no-update-notifier',
  '--env-mode=strict',
  '--framework-inference=false',
);

if (!commandLineArguments.some((arg) => arg.startsWith('--concurrency'))) {
  commandLineArguments.push('--concurrency=100%');
}

spawn('turbo', commandLineArguments, {
  cwd: process.cwd(),
  stdio: 'inherit',
  env: process.env,
  // set shell to true for windows (https://stackoverflow.com/a/54515183)
  shell: process.platform === 'win32',
}).on('exit', (code) => {
  if (code !== null) {
    process.exitCode = code;
  }
});
kevinpastor commented 8 months ago

We would have a need for this to avoid running multiple build job in parallel because it sometimes leads to Node running out of memory.