Scripts are run unnecessarily depending on how turbo is configured

dcherman commented 2 years ago

What version of Turborepo are you using?

1.1.9

What package manager are you using / does the bug impact?

Yarn v1

What operating system are you using?

Mac

Describe the Bug

When running a command like turbo run test, I would expect it to build the graph of tasks to run based on packages that actually define a test task. Instead, it seems like it assumes that all packages contain that task which results in needlessly running all tasks that would be dependencies.

In the case of the reproducer below, only the bar package contains a test script with a dependency on foo. The build task in baz was executed despite the output of it not being required for executing the requested tasks.

Expected Behavior

Only packages that actually contain the tasks to run should be considered when building the task graph.

To Reproduce

Clone https://github.com/dcherman/turborepo-bad-script-reproducer
yarn install
yarn reproduce

Since only bar has a test script, I expected to see the build task from foo and bar run followed by the test task from bar. The build task of baz runs despite it having no test task and it not being a dependency of either of the other packages.

Here's an image of the graph that's generated from running turbo run test --graph

Note that the both the foo and baz tasks contain a node for the test task despite not actually defining a test task in package.json.

gsoltis commented 2 years ago

I can see the issue here, and I think I'm maybe 99% convinced the behavior should change. I am a little worried about anyone using "synthetic" tasks to trigger upstream packages to do something, but I think we have other ways of handling that (--include-dependencies / --filter=<pkg>... if it is desired behavior.

mantljosh commented 2 years ago

I just ran into this as well and found it surprising and it is causing issues for my use case:

I'm currently migrating a very large monorepo over to use turborepo, and trying to update our scripts one package at a time, and my pipeline contains a top level script turbo:dev - but I cannot run turbo run turbo:dev without filters because it will attempt to run all the dependency scripts on all of my packages (even ones that don't yet have a turbo:dev) which ultimately fails because those scripts are misconfigured.

For now I need to pass a really long filter enumerating out ever package that has been converted to avoid this issue

michaelkplai commented 2 years ago

I think this discussion may also be related #1106.

I also recently ran into this issue. The section of the documentation that mentions that missing tasks are gracefully ignored led me to assume missing task dependencies would also be ignored. See https://turborepo.org/docs/core-concepts/pipelines#tasks-that-are-in-the-pipeline-but-not-in-some-packagejson.

Since ignoring missing task dependencies would be a breaking change it may make sense to introduce this as a new cli flag or config option rather than overwrite the default behavior.

CLI Option turbo run build --ignore-missing-task-deps

turbo.json Option ”ignoreMissingTaskDeps”: true.

I’d be open to a more terse option name. However, --if-present could be a little confusing since the npm implementation is more about suppressing errors for missing scripts and has no relation to dependencies. See https://docs.npmjs.com/cli/v8/commands/npm-run-script#if-present.

I’d be happy to look into contributing a PR if that helps move this issue along.

finn-orsini commented 2 years ago

Joining the discussion here from a similar issue that was marked as duplicate.

If I understand correctly, the current behavior can be described as:

Turbo will attempt to run tasks that do not exist, and "gracefully ignore" them.
Turbo will attempt to run dependencies of tasks that do not exist.

and these two behaviors repeat themselves all the way down the pipeline dependency graph until no more potential dependencies are found.

This issue appears to be discussing the second bullet point, i.e. should Turbo try to run dependencies of tasks that do not exist?

Although similar, #1135 was more related to the first bullet point, i.e. what does "gracefully ignoring" a task mean with respect to the --dry-run output? Currently the missing tasks are filtered from the overall task list if they don't exist, but they still appear as dependencies of existing tasks.

I'm not completely convinced these are the same issue - but maybe there are some implementation details that would result in the decision here addressing both?

gsoltis commented 2 years ago

@finn-orsini you're right, they are technically distinct but related.

I think we're going to move forward with a PR to prune packages that don't have the requested task, which will have the result of causing the output from --dry-run match what is actually executed.

However, given that the above is technically a breaking change (workaround: define a no-op task), it may take a little while to land. In the meantime, I'll put up a PR to include the non-existent tasks in the --dry-run output, so that whichever way the above change goes, at least --dry-run will be more informative.

zanona commented 2 years ago

Would it be possible to have the expected behaviour under a flag i.e:--if-present and in the next major default to it instead?

TxHawks commented 2 years ago

I'd really like to see an intermediary step too. I was actually forced to rename some of my pipline tasks because of this behavior, which is far from ideal

gsoltis commented 2 years ago

I will look into putting this behind a flag.

ObliviousHarmony commented 2 years ago

I actually think this behavior makes a lot of sense. At its core, Turborepo seems to be a task orchestration tool with comprehensive support for caching to avoid unnecessary task execution. I see a pipeline as a declaration of task dependencies and caching more than a specific list of package.json scripts to call. I think it's counterintuitive that Turborepo would decide what I meant when I said I wanted to execute a dependent task first. Besides, isn't the point of the cache that this shouldn't matter after the first execution?

I am currently working on migrating from Nx to Turborepo in a multi-language monorepo. There are cases where "build" has no meaning to a project, but unless the dependencies are built, it will not work. Considering that these are also paired with "test" pipelines that depend on "build", it seems intuitive that it would build the dependencies all the way down.

I recognize that explicit "{package}#{task}" pipelines can resolve this but it seems unnecessary to add a bunch of copy/pasted pipelines to tell Turborepo I want it to do something I feel like I already told it to do. Having to use --filter calls also feels like an unnecessary workaround. It's weird to have to run turbo run build --filter={package}... on every package to get it to actually build dependencies when I already made "build" dependent on "^build". While the discussion here includes a proposal for an option to exhibit the current behavior, having to pass this to every turbo call in our repository is excessive.

While I understand that my opinion seems to be the minority, I think there's an ideological question here about what we mean when we declare a task dependency. Approaching the problem from that angle, a better solution could be found in the configuration of the pipelines themselves. My first thought was that we could add a new pipeline configuration option to declare dependency execution intent but this doesn't feel expressive enough.

My proposal would be to use a token to tell Turborepo what we mean when we say that a task depends on another. If we want to maintain backward compatibility, a token like ^= could say that we want to stop traversing down the graph when we meet a dependency that does not have a task. If we are fine with breaking compatibility, a token like ^~ could say that we're doing a fuzzy dependency match and want to execute a task all the way down. I like this solution because you can mix and match how a specific dependent task should behave.

mantljosh commented 2 years ago

@ObliviousHarmony from what you describe it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run (regardless of whether or not the package itself needs to build).

ObliviousHarmony commented 2 years ago

it sounds like your test task should explicitly "dependsOn": ["build", "^build"] if building dependencies are required for tests to run

I thought about this too @mantljosh, but, it falls apart with transitive dependencies. Even if my "test" task builds the direct dependencies of my package, when one of those also has no "build" but has dependencies that do, we reach the same conclusion. In my case, there are packages pulled into the main app which are not built themselves but have JS dependencies that are.

gsoltis commented 2 years ago

Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies. Given that's a scenario that we want to support, I don't think we can proceed with this change as a default.

Perhaps we could augment the --filter syntax to include matching on having a particular task? So you could run turbo run test --filter=#has-task or something like that. Or perhaps this could be configured per-task in turbo.json?

ObliviousHarmony commented 2 years ago

@gsoltis Augmenting the --filter syntax likely wouldn't be sufficient since you would want to be able to run turbo run test --filter={package} and have it exhibit the desired behavior too (unless you're suggesting --filter={package}#has-task?) What do you think of my token proposal above?

dcherman commented 2 years ago

@ObliviousHarmony It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

Although this is absolutely a breaking change, you could accomplish this by adding a build task to your project that does nothing but exit 0. At that point, your project actually has a build task and it's no longer surprising that it'd be included in the pipeline.

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

ObliviousHarmony commented 2 years ago

It seems like surprising behavior that Turborepo generates the task pipeline as if every project contains a build task even if one or more project don't actually contain one.

I don't think either behavior is objectively right. The difference of opinion here doesn't really matter, I admitted already I am probably in the minority here 😃

FWIW, I think the proposal to add --if-present to the current version with a default to false and potentially change it to default to true in the next major version is a great idea.

Honestly, I would just like to settle on a solution that doesn't require either workflow to remember to pass an argument to do the right thing. These are radically different behaviors and seem like something better handled at the configuration level.

What did you think of my proposal?

dcherman commented 2 years ago

@ObliviousHarmony The problem isn't solely that the dependencies don't have the task to run, so I'm not sure that your token proposal would actually solve the problem described in the original bug. To fix that one, we need to filter the initial set of projects down to ones that actually have the task being run. I see your point about transitive dependencies though and wonder if we'd end up needing a combination of both solutions.

FWIW, that if-present flag could be supported in config pretty easily if that's desirable - just move the flag to an ifPresent boolean property in the config file. This could follow the same precedence rules as other options so that environment variables, config, and flags are all supported.

ObliviousHarmony commented 2 years ago

so I'm not sure that your token proposal would actually solve the problem described in the original bug

Ah @dcherman, I came here from the open pull request and so I misunderstood the original bug it seems. No, my proposal wouldn't solve that.

I 100% agree that running the "build" task on projects that don't have a "test" task doesn't make any sense. There is still a case for running a "build" task on a project with no script but that seems like the edge case. Either a configuration option or the filter augmentation proposed by @gsoltis would work just fine. One benefit of the filtering approach is that it creates a base for the hash syntax to add modifiers to filtering.

I'd suggest an approach of applying that filtering to the nodes selected for entry into the graph and then having transitive dependencies execute tasks with the original behavior. A token probably doesn't make much sense here, I can't imagine any cases where you would want to run a "test" command but not build the dependencies of that task. The option/filter/whatever can cover the entry "build" edge case just fine.

As an aside, I've added basically nothing to this issue's discussion 🗡️

gsoltis commented 2 years ago

One concern I have with using the configuration approach is the ability to override it. Currently, tasks in turbo.json map 1:1 to the name of the script being run, so it is not feasible to, for instance, have two copies of "test", with and without the option to include targets that don't have a "test" script. This means that regardless of whether or not we augment the config file, we must have a flag to control behavior per-execution.

mantljosh commented 2 years ago

This is an annoying problem because like other's have mentioned, both approaches have benefits and downsides.

I think there may be a safe incremental step to take, where we only do the "ignoring" based off the top level of the execution graph. What I mean is that if I run turbo run test, it should immediately filter out any packages that do not have a test script and ignore them (and their dependencies) entirely.

georeith commented 2 years ago

This would break my use case:

I have tests that run from source, however I want the cache of those tests to miss if one of their dependencies changes regardless of whether that package itself has tests.

However I would say that ^tests is not necessarily the right thing here either as I don't care if my topological dependencies test configuration changes.

Really what I want to express is that I want to use the input hash of build but I don't actually want to run build. There's no way of currently doing this I believe?

dcherman commented 2 years ago

Until this is addressed, here's how I've been working around this:

This script generates the set of filter flags needed to run any given task

targetScript="$1"
mapfile -t < <(find packages -name "package.json" | grep -Ev "(node_modules|bower_components)")
jq -r --arg targetScript "$targetScript" '[inputs] | map(select(.scripts[$targetScript] != null)) | map("--filter=" + .name) | join(" ")' "${MAPFILE[@]}"

Which you can then pass to CLI using expansion:

turbo run build ${FILTER_FLAGS[*]}

It just supports a single script since that's our use case, but modifying it for multi task support should be pretty easy if needed.

tom-sherman commented 2 years ago

Discussed a bit w/ @jaredpalmer and he pointed out that it's reasonable for pure-js packages to not have a build step, but still require build steps from their dependencies.

@gsoltis In this scenario wouldn't it be reasonable to expect that you have to explicitly define dependsOn: [^build]?

tom-sherman commented 2 years ago

I have a slightly more minimal reproduction: https://github.com/tom-sherman/turborepo-dependency-bug

In this repo i have a graph as follows

graph TD
foo#test --> foo#build --> __root__
bar#build --> __root__

My pipeline configuration looks like this:

{
  "pipeline": {
    "build": {
      "dependsOn": ["^build"],
      "outputs": ["dist/**", ".next/**"]
    },
    "test": {
      "dependsOn": ["build"],
      "outputs": []
    }
  }
}

When I turbo run test I expect foo#build and foo#test to run, however bar#build also runs unexpectedly.

bar#build is not an ancestor of any test task so I don't expect it to run.

This would not be the case if bar was a node_modules dependency of foo but in this example it's not, they are two sibling packages. In the scenario where I want bar#build to run I believe it's expected to have to make the dependency between foo#test and it explicit somehow.

gsoltis commented 2 years ago

@tom-sherman turbo run test is attempting to run foo#test and bar#test. Since test depends on build, foo#build and bar#build are considered necessary. In this case, it's obvious we don't actually want bar#build because we're running tests, and bar does not have any tests.

The story is different with the build example. Currently, a package that has no build step of its own, but requires its dependencies to be built, looks very similar to the above case with test. I.e. filtering out the non-existent task first, before resolving dependencies, would result in that package's dependencies not being built, even with a dependency on ^build.

So the question is how we want to handle these two scenarios. In my opinion, as a default, doing extra work is probably preferable to not doing enough work, especially if the extra work is easily cache-able. So I think the current behavior should remain as a default. But, we can absolutely look at ways of providing configuration hooks (cli flags, env vars, turbo.json keys, etc.) to allow users to specify when they want to opt in to the desired behavior in your test example.

As an additional note, there are two flavors of skipping undefined tasks, assuming the behavior has been opted-in to:

always skip them. Anytime you encounter an undefined task, consider it "done" and stop the dependency graph traversal
only filter the tasks that have been specified and are not in the entrypoint packages. In this case, If you run turbo run test, and a package is missing a test script, it and its dependencies won't get run. But, if a dependency of a package with a test script is missing a build script, we would not stop the traversal, and instead continue with building its dependencies.

I think the latter is more likely to work as expected. Tasks that really want to stop traversal can already opt in to that behavior on a per-package-task basis by defining a package-task with an empty dependsOn.

tom-sherman commented 2 years ago

@gsoltis I don't understand the build example, can you please create a minimal reproduction? Really struggling to get my head around how it's the same issue as what I've described and also matching my mental model up with what you're describing

gsoltis commented 2 years ago

I'll see if I can get a repo together, that's probably a good idea for this discussion.

But in general, turbo works by taking the user input for which tasks to run (build, test, etc.) and which packages to run them in ("entry points"), and then filling in the graph for what needs to be done to run those tasks in the entry point packages.

The essence of this discussion is what to do when you encounter a node in that graph that doesn't define the task we're looking for. Do you stop traversing and consider it done? Or do you pretend that the script exists and continue traversing to what would be its dependencies if it did exist. So a package missing a test script and a task missing a build script face the same decision: do we run the dependencies, even though the script is missing. The trouble comes from the difference in expectations for the two cases. For the build case, I think consensus is "yes, run the dependencies", and for test, it looks like we would prefer not to run the dependencies, as there are no tests to run. But without turbo knowing the difference between test and build, it has no way to distinguish the two cases.

Some other distinctions that we can make for this behavior are "is this package one of the entry point packages" and "is this task one of the requested tasks" vs packages and tasks that are transitively included as dependencies.

gsoltis commented 2 years ago

@tom-sherman Sorry for the delay, but I have put together a sample that I believe demonstrates the issue: https://github.com/gsoltis/sample-monorepo

There are two top-level applications, only one has test script. There is also a library that has no build script, but depends on another library that does have a build script.

arturenault commented 2 years ago

Any updates here? This breaks my use case too.

mantljosh commented 2 years ago

While I think there are some difficulties that have been raised (i.e. a plain-js package that still needs its dependencies to be built even if it has no build script), I think there is an improvement that can be safely made where if I run turbo run test then it will:

First filter the list of packages to only ones that have a test script
Then run the build in the same way that is already run today (with potentially unnecessary steps being run)

This would at least let us avoid the case where turbo run test ends up building the world, even for things that will never be used by the test scripts.

gsoltis commented 2 years ago

@arturenault can you describe what specifically is broken? Is turbo doing extra work, or is some task being run incorrectly?

My understanding of the current state of the world is that in the worst case, turbo does unnecessary work, and the difficulty is in distinguishing what exactly is unnecessary. As you can see in the thread, there are a variety of opinions on what the expected default behavior is, so we are currently defaulting to possibly doing more, rather than possibly doing less than expected.

gsoltis commented 2 years ago

@mantljosh are you proposing to special-case test? Or somehow mark it in turbo.json to opt in to that behavior? We need some way to distinguish it from the build case.

FWIW, I think a marker of some sort in turbo.json might be the way to solve this, and we could even default to setting that marker for test in create-turbo. I'm not sure I would support special-casing specific task names inside turbo though.

arturenault commented 2 years ago

@gsoltis My case did cause actual issues. We use serverless framework for some of our projects which packages things differently for different environments. So those projects have both a build and a build:${ENV} script. Both depend only on ^build.

We have an integration-tests project which has no lifecycle scripts, but depends on api, which is a serverless framework project.

So when I run, e.g., turbo run deploy:test (depends on build:test):

api:build:test gets run as a dependency of api:deploy:test.
api:build gets run as a dependency of the (nonexistent) integration-tests:deploy:test.

This causes two builds to (potentially) happen at the same time, one for the test environment and one for the (default) dev environment. This causes build failures because both builds try to write to the same files.

I suspect it could cause the build to deploy to the wrong environment, but I'm not sure if turbo reads files directly from the filesystem or if it checks the dependency tree to fetch outputs from the right dependent task.

For now I've been able to work around this by excluding integration-tests from all my deployment scripts. But it's definitely possible that my setup isn't ideal! We're migrating from lerna and I've just been adapting our existing commands.

gsoltis commented 2 years ago

@arturenault Making sure I understand:

deploy:${ENV} depends on build:${ENV}, that's why api:deploy:test causes api:build:test to run.

integration-tests:deploy:test doesn't exist, but depends on build:test, which depends on ^build, so api:build is also triggered. That's where the environment specifier is getting lost?

It does seem like there's the potential for this setup to break if you have two top-level applications and one depends on the other. For instance, if a hypothetical web depends on api, trying to deploy both web:deploy:test and api:deploy:test this way will result in the same behavior: building both api:build:test and api:build.

In terms of what you're trying to accomplish right now, when you run turbo run deploy:test, are you trying to deploy more than one package? While this gets sorted out, perhaps there's a way to use --filter to get at the right set of packages?

This also does highlight one current sharp edge: package-task combos that read or write to the same location need to have a dependency defined between them, or else they could get scheduled concurrently. It's something that's on our roadmap to detect automatically. I'm not sure that would fully solve your specific case, but it would at least fail in a more consistent state.

mantljosh commented 2 years ago

@mantljosh are you proposing to special-case test? Or somehow mark it in turbo.json to opt in to that behavior? We need some way to distinguish it from the build case.

@gsoltis no there would be no special casing here - just when running turbo run some-script the first step should be to determine which packages even have a some-script, and only compute the tasks to run on those.

gsoltis commented 2 years ago

@mantljosh got it. We unfortunately can't make that the default behavior because some packages still need their dependencies to be built before they can consider themselves to have been built, even if they have no build script.

georeith commented 2 years ago

Although this is absolutely a breaking change, you could accomplish this by adding a build task to your project that does nothing but exit 0. At that point, your project actually has a build task and it's no longer surprising that it'd be included in the pipeline.

This to me seems like the most sensible answer, pipelines would work how the docs explain them (IMO much more intuitively) and edge cases can explicitly opt-in.

Sadly there is not anyway to opt-out of this currently and this is a pretty critical issue when say you have 30 packages and want to run a script in 2 of them that depends on those packages being built.

I think most of us would rather not wait for a breaking change given the above, even if I feel like the breaking change is the more pure solution. So would like to +1 adding a configuration option to make turborepo filter packages by the existence of said script first.

LionC commented 1 year ago

Just to add another resulting problem to this (talking to @anthonyshew in Discord about this right now):

This will also lead to turbo reporting circular dependencies and thus not executing anything, even though with the actual tasks and dependencies given, there is no circular dependency and it would be executable just fine.

For an example, see this reproduction repository: https://github.com/LionC/turbo-circular-dependency-example

Here, with all tasks that are present in the workspaces, the graph should look like this

But instead it reports a circular dependency

 ERROR  run failed: error preparing engine: Invalid task dependency graph:
cyclic dependency detected:
    schema#build,___ROOT___#codegen,___ROOT___#build

dobesv commented 1 year ago

I think this issue is causing us a lot of unnecessary rebuilds. Sure you can say the jobs are cacheable but keep in mind that they won't be cached if they depend on tasks that are being "virtually" re-run, even though those tasks don't exist. So this problem adds a lot of a imaginary dependencies and causes a lot of unexpected rebuilds.

Some use cases that I have which are relevant here:

I have a build script that needs another tool to be built in order to run - if the script is defined for that package - and which depends on some tasks from its dependencies. Currently this just means every package will seem to require that tool to be built and those tasks from their dependencies to be built as well, unless I specificallly define a default version of the task with no dependsOn and then write a package-specific version of the task for every package that actually uses the tool. e.g. the tool is generate-json-schema-from-types and the script build:json-schema. I only want that task to depend on generate-json-schema-from-types and ^build:types if it exists for a package.

I have a generic build-all script in each package called build, which runs the specific scripts for that package. Turbo also can run the specific tasks for that package. It would be nice if I could have a sort of "virtual" task in turbo which generates dependencies but does NOT run the task.

A couple solution ideas:

When defining a task, have a way to write a task spec that only applies to packages which define the task. This would be equivalent to writing out all the @xxx/yyy#build:json-schema jobs individually for those packages, but leaves the "default" version with blank dependsOn, inputs, outputs, etc.. So for example the selection syntax could be something like ":has-script(build:json-schema)#build:json-schema": { "dependsOn": ["^build:types", ...] } or maybe simpler "#build:json-schema" : { "dependsOn": ["^build:types", ...] } Other packages would have no definition for that task, so no dependsOn and no synthetic dependencies.

Have an option on a task to mark it as "virtual" which means it will never try to run the actual script, it just is used for dependency purposes.

connorjs commented 1 year ago

I came here to request the original behavior, but I totally agree with the existing default after reading this conversation (thanks all!). Specifically this comment.

In my opinion, as a default, doing extra work is probably preferable to not doing enough work, especially if the extra work is easily cache-able.

Following this discussion, I chose to use "outputMode": "new-only" in my "build" task to simplify the output of the “target” tasks (examples: test, start, preview), because the "build" tasks in other packages will often hit the cache.

This does remove some nice things on re-run (namely: omitting bundle stats), but I often have HTML visualizations for bundle stats I can open anyway.

RIP21 commented 1 year ago

For me, this is very annoying as I have e2e tests. And I want to run the e2e suite for BE in one workflow ignoring everything related to frontend e2e tests. So I don't need to build any frontend dependencies nor do I want a Next.js build to be built which takes forever compared to libraries that are instant/cached mostly. Also, since the environment is not prepared for Next.js to build it breaks the pipeline. But since test:e2e (non-existent in Next.js package) depends on build and build^ it still gets unexpectedly triggered for no good reason. I'll have to ignore it as well with ! to avoid that one explicitly, I guess. But I'm quite surprised it's still discussed here in length when it's an expected/default behavior for all build/monorepo tools I used (Maven, Gradle, Rush.js, Lerna (I guess, used it long time ago))

tannerbaum commented 1 year ago

So I don't need to build any frontend dependencies nor do I want a Next.js build to be built ... But since test:e2e (non-existent in Next.js package) depends on build and build^ it still gets unexpectedly triggered for no good reason... But I'm quite surprised it's still discussed here in length when it's an expected/default behavior for all build/monorepo tools I used (Maven, Gradle, Rush.js, Lerna (I guess, used it long time ago))

I think this issue is a bit complicated because several topics are being discussed, but to me this is a great summary of my problem, which as @michaelkplai outlines and provides a great solution is not clear in the docs.

It doesn't have to be the default behavior, but adding a --ignore-missing-task-deps config and clarifying that they aren't ignored in the docs would be a huge improvement.

Edit: I was saying the discussion within the issue had gotten complicated due to parallel discussions like how dry-mode handles these tasks.

RIP21 commented 1 year ago

Unsure what's complex here. But if task is absent for the package, then you don't need to run its dependent tasks. Simple as that.

In other words: If test:e2e task is dependent on own build and dependencies build, then you run own build and build of the dependencies ONLY if test:e2e task is present for the package. If it's not, then you don't need to run its build and upstream build tasks as artifacts they generate won't be needed/won't be used.

For me it's so obvious I don't understand what to discuss here. People may opt into current weird behavior via flag, but default should be the one I described above.

franzwilhelm commented 1 year ago

1000x thumbs up to what @RIP21 is saying here. Super simple

dobesv commented 1 year ago

if task is absent for the package, then you don't need to run its dependent tasks. Simple as that.

I do agree this should probably be the default.

I also want tasks that were always "virtual" and not only don't have to exist on the package, but won't be run even if they do.

And tasks that behave the way they do now.

michaelkplai commented 1 year ago

We are currently working around this issue using workspace configurations.

If a task doesn't exist in a workspace we overwrite the dependsOn scripts.

{
  "extends": ["//"],
  "pipeline": {
    "missing-task": {
      "dependsOn": []
    }
  }
}

Would still prefer a more permanent solution in the form of an global option to ignore missing tasks.

mehulkar commented 11 months ago

Hey all, it's clear from the discussion here that we need a more powerful way to express how these "Transit Nodes" behave. In the meantime, I've added some docs about how it works today and why.

It's clear that both behaviors are defensible, but to be clear: we will not be changing the default behavior before a 2.0 release, since it's a very fundamentally breaking change to the Task Graph construction.

We most likely need a more expressive dependsOn configuration (as opposed to expanding --filter or --ignore-missing-task-deps as suggested earlier in this issue). My reasoning is that in most use cases a pipeline has a "correct" definition of what task dependencies should be, rather than a pipeline that needs to run both ways (although you may well define one task entrypoint one way, and another task entrypoint another way).

closesimple-wl commented 11 months ago

@mehulkar there's a small mistake in "For example, let's say you've configured your build task to depend on ^test:". That text looks correct, but it doesn't match the example code below. Love the clear documentation though; thank you. 👍

mehulkar commented 11 months ago

Thanks @closesimple-wl , fixed in https://github.com/vercel/turbo/commit/1fa0fe1238912db05ed12ba26475ec41f6f8e563!

rjohnsonbade commented 10 months ago

Just adding to this thread as I spent some time this morning figuring out why tasks were running unexpectedly, and it was due to this issue.

I agree with this comment, https://github.com/vercel/turbo/issues/937#issuecomment-1682029335.

The most obvious behaviour is that if a task does not exist for a workspace, then the dependencies will not be run. I understand this is a breaking change, but strongly believe that should be the default behaviour.

I understand that this functionality is required for "synthetic" tasks, however my initial impression of this use case (as documented here), was that it felt a little hacky, and not very obvious.

I think a more explicit way of configuring these "synthetic tasks"; and changing the default functionality for dependencies of non-existent tasks, would make turborepo more intuitive overall.

dobesv commented 6 months ago

I would be willing to work on this if I had some direction from the maintainers what solution they are willing to accept.

Maybe a couple boolean fields?

"requiresScript": true - if the script doesn't exist in the package, don't process any dependencies (treat as {dependsOn: [], inputs: [], outputs: [].
"runScript": false - if false, don't run the script, even if it exists. Defaults to true

To get the current behavior, both could be absent or false. It could be possible to say that in the future requiresScript will be true by default and put a warning if no value is provided, so people who want to keep the current behavior should set "requiresScript": false.

I do feel like there should be a better name for these fields, but struggling a bit to find one.

vercel / turborepo