vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26.36k stars 1.83k forks source link

Support executing multiple dependent long-running tasks in parallel #1497

Open robaca opened 2 years ago

robaca commented 2 years ago

Describe the feature you'd like to request

In our monorepo, we have multiple services that to some degree depend on each other. For local development, we have to start a mock service first, then a second one, and then all others in any order.

For local development and testing, it would be great, to be able to start all these services or only one service and it's dependencies via turborepo in one step.

Describe the solution you'd like

It would be cool if turbo can be configured to not wait for a process to exit, but for some string or regexp to appear on stdout|stderr and put all processes into background until Turbo itself is terminated via Ctrl+C.

We could then just configure our server tasks like this:

    "mock-service#server": {
      "dependsOn": ["build"],
      "longRunning": {
        "waitFor": {
          "stdout": "Service started successfully"
          "timeout": "60s"
        },
      },
      "cache": false
    },
    "my-service#server": {
      "dependsOn": ["build", "mock-service#server"],
      "longRunning": true, 
      "cache": false
    },

For dependencies, it would be needed to be able to specify an explicit package, as runtime dependencies do not necessarily match build time dependencies. I'm not sure if this is already possble, but at least it's not documented.

A leaf task should not need to configure a waiting condition, but having them for all background tasks would make it possible to output some informative message that all background tasks had a successful startup and now everything is up and running.

If a dependency is detected on a task that has longRunning: true, turbo could fail to run because of misconfiguration.

Nice-to-have on top:

Describe alternatives you've considered

Create a command line tool that starts a process, puts it in background and exits by itself when some message appears on stdout or stderr. I'm not sure if it's possible then to stop that background processes if the turbo command is interrupted via Ctrl+C.

jlarmstrongiv commented 2 years ago

Reminds me of https://www.npmjs.com/package/wait-on

edwardwilson commented 2 years ago

I have a similar requirement. In my mono repo, I have an app in which I need to have multiple long-running processors started at the beginning. The order doesn’t really matter to me. I just have three long-running that at all need to start.

I have a Vite build with the watch flag. Another process that needs to be run and a custom web server process. Each can start independently and in parallel.

This can be achieved with NPM, Scripps and packages to enable parallel processes, but turborepo does not seem to support this via the pipelines. Turborepo will wait for the first process to complete, which never happens due to the long runing nature.

grabbou commented 2 years ago

That would be so cool. Imagine having multiple packages that all need to spin up (run start-dev) in order for you to start working.

weyert commented 2 years ago

Yeah, this would be useful, I sometimes want to run some integration tests and want to make sure the backend services are running before they start.

grabbou commented 2 years ago

Not to spam anymore in the thread, but for anyone looking for a nice alternative on top of turborepo would be to configure Visual Studio Code to have a special "task" to spin all processes in a shared terminal session. Here's my gist that spins two processes, useful for development workflow: https://gist.github.com/grabbou/1e19049ebc6127b269f4230bfaed5170

spigelli commented 1 year ago

I've been looking for something like this for a while now. For me this addresses my docker compose problem.

I don't like to keep my docker compose services running in the background:

  1. Ports conflict with my other projects when I forget to docker-compose down
  2. Some things are process intensive

Additionally things start to get messy when you're integrating a few OSS tools that each depend on multiple services. Imagine for example you're building some sort of webrtc app, you're integrating the following tools and they're docker services:

So what I've considered in the past is making small npm wrappers as "apps" for different associated services, since that's where they would be if they were run from source.

hymair commented 1 year ago

In my case I want to run these in sequence and I can't find a proper way to do it currently as dev is a long-running task so dependsOn never resolves.

  1. run backend#dev
  2. run app#gen-types
  3. run app#dev
  4. run web#dev
VanCoding commented 1 year ago

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

sschneider-ihre-pvs commented 1 year ago

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

the protocol looks a xstate state machine

eboody commented 1 year ago

Hi guys

I want to throw my protocol into the mix as a possible solution to this as well.

It's a very simple way how task-runners like (turborepo, Nx, you name it) could talk to non-terminating watch-processes and tell them when to rebuild, and get the results of those builds.

The idea is that we settle on one protocol, that then can be implemented by a lot of task runners and build tools. I'd really like to get your feedback on this! I hope to get the discussion started here

could you provide an example of how I would get started using your protocol for managing tsup --watch or tsc -w to restart some file-watching server when some other package has a file change?

kirill-konshin commented 1 year ago

Usually this kind of management is required for (as bare minimum) TS lib + website, when TS lib in watch mode has to produce some output at least once, then website can pick up the output and keep watching it.

See my old article, section Starting/watching. The easiest approach so far would be to run regular build (not watch) task, which can successfully end, and then run watch task but without initial output.

Unfortunately, this brings overhead until https://github.com/webpack/webpack/issues/4991 is fixed (previously TS was prone too: bug 12996. UPDATE In TypeScript 3.4 new incremental option has been introduced: it will produce a build and a cache, so no matter how often you restart watchers it will be ready much faster. Unfortunately it’s not yet supported by TS Loader for Webpack It still does not eliminate the necessity to pre-build libraries.

In any case build+watch approach for library may cause web to be built twice, first after build then after first output of watch, if website rebuild is looking at timestamps, not contents.

dobesv commented 1 year ago

If you have interdependent processes you want to run, I think just putting a "wait" task in front of them makes sense. Can use wait-on to wait for the port to open.

This doesn't seem to require special support from turborepo to work.

kirill-konshin commented 1 year ago

True, but it’s an extra tool and it looks awkward to have all such scripts like this: “wait-on blabla && real-thing”. It would be much more convenient if turbo config can have “waitOn” property for such tasks. Property can either call an NPM script that has to exit when condition is met, or a list of files/globs/URLs.

It would be a nice advantage over NX which has same problem now.

I've implemented the wait approach in my Next Redux Wrapper repo, it works, but does not look as sleek as Turbo configs can look:

https://github.com/kirill-konshin/next-redux-wrapper/blob/3a14963a0c8a3ebf39baa331b6a467bbc6cb8ee5/packages/wrapper/package.json#L21 https://github.com/kirill-konshin/next-redux-wrapper/blob/3a14963a0c8a3ebf39baa331b6a467bbc6cb8ee5/packages/demo-redux-toolkit/package.json#L9

dobesv commented 1 year ago

Hmm true. Not sure how far down the road of being a dev job runner turbo wants to go but it could be handy to have all that stuff integrated, especially if it could watch files and rebuild and restart the dev processes.

kirill-konshin commented 1 year ago

I’d say that it could directly call wait-on package, by passing args. Or just call a certain NPM wait script like in my case. It should be just a gate to delay dev process start, that’s all what is needed. Actual watching and restarting is the dev processes’ own responsibility. This approach will clearly separate concerns and provide good looking user friendly tool.

dobesv commented 1 year ago

Interesting, what do you use to get your dev process to restart that isn't some clutter on the command line like wait-on

Regards,

Dobes On Feb 8, 2023, 5:50 PM -0800, Kirill Konshin @.***>, wrote:

I’d say that it could directly call wait-on package, by passing args. Or just call a certain NPM wait script like in my case. It should be just a gate to delay dev process start, that’s all what is needed. Actual watching and restarting is the dev processes’ own responsibility. This approach will clearly separate concerns and provide good looking user friendly tool. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

kirill-konshin commented 1 year ago

For example, Next.js dev restarts by itself when it detects changes in next.config.js, for other cases I use nodemon and so on. The idea is that dev script should be self-aware and know how to restart, that’s not turbo responsibility.

But knowledge that dev script depends on something in order to run - is a turbo responsibility, since it manages all that kind of stuff.

dobesv commented 1 year ago

Hmm I'm not sure about that.  Turbo knows about running things to build things, orchestrating dev servers seems a bit extra.  Especially if you are already doing nodemon you might as well throw in wait-on in a similar way On Feb 8, 2023, 6:07 PM -0800, Kirill Konshin @.***>, wrote:

For example, Next.js dev restarts by itself when it detects changes in next.config.js, for other cases I use nodemon and so on. The idea is that dev script should be self-aware and know how to restart, that’s not turbo responsibility. But knowledge that dev script depends on something in order to run - is a turbo responsibility, since it manages all that kind of stuff. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

kirill-konshin commented 1 year ago

Turbo manages dependencies between scripts. That’s the essence. Dev scripts can do whatever - restart, die, that’s lifecycle thing. Waiting for files to be able to launch dev script without error is a dependency problem.

kirill-konshin commented 1 year ago

In my case wait scripts watch other packages files, that’s awkward :) because for all non-dev scripts turbo takes care of it.

E.g. if package A depends on package B, build scripts of B will be scheduled to run before build script of A, turbo takes care of it. But this somehow does not apply on dev scripts. That’s what I mean by managing dependencies.

dobesv commented 1 year ago

Waiting for files to be able to launch dev script without error is a dependency problem.

I think if you are waiting for files to be built, you can do that with turbo as it is, if the files are generated by a process that exits. Are the files you are waiting for being generated by a process that does not exit ?

kirill-konshin commented 1 year ago

Consider following: package A is a library, built with TSC, has build and dev scripts (build & watch). Package B depends on package A, for simplicity, B has only dev script.

So in order to run dev in all packages, with the fresh repo, package A must emit js files first, otherwise B#dev will just fail.

There are 2 tasks that can emit these files: A#dev and A#build. Persistent tasks can't depend on other persistent task, so we only can make dev to depend on build, but this will lead to double processing:

  1. A#build emits files
  2. *#dev starts
  3. A#dev emits same files again, which is double effort, files were already there
  4. B#dev rebuilds, because it is triggered by step 3, double processing here as well

Solution

We can introduce wait script (with wait-on whatever in package A) and make dev scripts to depend on it:

# package-a/package.json
{
  "scripts": {
    "build": "tsc ...",
    "dev": "yarn build --watch",
    "wait": "wait-on lib/index.js"
  }
}

# package-b/package.json
{
  "scripts": {
    "dev": "next dev"
  }
}

# turbo.json
{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"]
    },
    "wait": {
       "cache": false,
    }
  }
}

If you want to be even more explicit, you can override A#dev to not have any dependencies:

{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"] // <--- by default all dev tasks depend on wait
    },
    "A#dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": [] // <--- except for package A, dev has no dependencies
    },
    "wait": {
       "cache": false
    }
  }
}

In both cases turbo run start results in:

  1. A#wait and A#dev starts in parallel
  2. A#dev produces output files
  3. A#wait exits
  4. B#dev starts

As expected.

I believe this solution is worthy of being documented to clearly explain the proper way to run development tasks.

P.S. Since we can define output for tasks that do exit, maybe we can also define it for those that don’t exit and treat it as permission to move on, which will allow non-exit tasks to be used as dependencies? Similar to waiting but more inline with turbo naming.

justinwaite commented 1 year ago

@kirill-konshin Thanks for this! I was able to simplify it for my use case by setting dependsOn in dev to "^wait":

{
  "$schema": "https://turbo.build/schema.json",
  "pipeline": {
    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["^wait"]
    },
    "wait": {
      "cache": false
    }
  }
}

Then with that I didn't have to do anything special for "Package A"

kirill-konshin commented 1 year ago

@justinwaite I'm glad my findings were useful for you. The only thing I'd like to highlight is that if wait depends on the initial output of dev then your pipeline can be stuck, wait task will wait for files and dev won't run until wait releases.

justinwaite commented 1 year ago

@justinwaite I'm glad my findings were useful for you. The only thing I'd like to highlight is that if wait depends on the initial output of dev then your pipeline can be stuck, wait task will wait for files and dev won't run until wait releases.

I think I see what your saying. If Package B depends on A, and you try and run dev on B without running dev on A, then this would get stuck. Right? But if you're always running them together, then I don't see how this could get stuck waiting.

kirill-konshin commented 1 year ago

@justinwaite if you configure it like in your example, all dev will depend on wait, including A#dev will depend on A#wait, which means A#dev can only run after A#wait but it waits for files that will be emitted by A#dev.

justinwaite commented 1 year ago

@justinwaite if you configure it like in your example, all dev will depend on wait, including A#dev will depend on A#wait, which means A#dev can only run after A#wait but it waits for files that will be emitted by A#dev.

I think you might be mistaken here. I have it set to ^wait which says "only wait on workspace dependencies' wait script, not my own".

From the docs:

The ^ symbol explicitly declares that the task has a dependency on a task in a workspace it depends on

{
"$schema": "https://turbo.build/schema.json",
"pipeline": {
"build": {
// "A workspace's `build` command depends on its dependencies'
// and devDependencies' `build` commands being completed first"
"dependsOn": ["^build"]
}
}
}

And I can confirm from my own repo that Package A does not run or wait on the wait script, since it has no workspace dependencies.

Edit: For more clarification, in order to get into the state that you are describing, you would have to do:

    "dev": {
      "cache": false,
      "persistent": true,
      "dependsOn": ["wait", "^wait"]
    },
kirill-konshin commented 1 year ago

@justinwaite you're right, I overlooked the ^, also, you've defined wait: {cache: false} with no dependencies. Works perfectly, great catch.

I have edited my original solution to be more concise.

tannerbaum commented 1 year ago

Wanted to also request such a feature. Without going into specifics we start up a little Apollo Server to modify/generate types from an external API.

When I have to do that before my test script for example, my turbo run looks something like turbo run start:server test stop:server. Without wait-on this would be impossible (test waits on server, stop waits on test) , but with it as you can see it leads to cluttered turbo runs or package.json scripts.

A solution like described in the original issue would be a huge improvement.

imsanchez commented 1 year ago

I'd like to share, in case anyone comes here looking for simple answers for running persistent tasks in parallel, the --filter argument worked for me.

This is what my dev script looks like:

dotenv -- turbo run dev --filter=backend --filter=ui --filter=frontend

And the 3 persistent tasks run in parallel. Although they execute in that order, they are not dependent of each other.

mehulkar commented 1 year ago

Thanks for all the disucssion here. In the interest of cleanup, I'm marking this as a duplicate of #986.

Netail commented 4 months ago

Thanks for all the disucssion here. In the interest of cleanup, I'm marking this as a duplicate of #986.

@mehulkar Not really a duplicate imo. This issue was more of a functionality to allow dependsOn packages to run persistent.

E.g. package-A requires to be transpiled for app-A, but we want to continue watching the changes in package-A while also running app-A after package-A transpiled once

cbou commented 3 months ago

Yes, @Netail is right. Please reopen this issue!

mehulkar commented 3 months ago

cc @chris-olszewski @NicholasLYang re-opened this, fyi

Netail commented 3 months ago

I think if nothing changes in the output files for x seconds, it should give a start signal to the dependents to start the run command

blikblum commented 3 months ago

I think if nothing changes in the output files for x seconds, it should give a start signal to the dependents to start the run command

wireit listen to stdout e stderr checking for a pattern match to decide when its ready: https://github.com/google/wireit?tab=readme-ov-file#service-readiness. Seems a better approach.

FYI my use case to this feature is for firebase projects. I need to populate the database (Firestore) after starting emulators. If i try to run the populate script just after emulators start, it does not work. I need to wait the services be ready.

Currently, to detect when emulators are ready, i use wait-on coupled with a firebase functions 'ping' endpoint.

Having a feature similar to wireit would help streamline my build setup

Netail commented 3 months ago

listen to stdout e stderr checking for a pattern match to decide when its ready

Good one, for Typescript that would be "Watching for file changes."

matart15 commented 2 months ago
  1. run backend#dev
  2. run app#gen-types
  3. run app#dev
  4. run web#dev

Trying to do same

tried 2nd option (dependsOn). https://github.com/vercel/turborepo/discussions/1347#discussioncomment-2908264

But it stuck on

  1. run backend#dev

because backend#dev have watch mode