gradle / gradle

Adaptable, fast automation for all
https://gradle.org
Apache License 2.0
17k stars 4.78k forks source link

Add lifecycle event provider for the start of execution phase to `FlowProviders` #29064

Open joshfriend opened 6 months ago

joshfriend commented 6 months ago

Expected Behavior

I'd like to be able to run a dataflow action at the start of the execution phase. Intended use would be to analyze the user's build configuration and environment and report any errors by failing the build early and reporting to a telemetry endpoint. This cannot be done during configuration phase because I don't want environment variables, properties or shell commands collected/run by this analysis to be captured by the configuration cache.

Current Behavior (optional)

FlowProviders currently only provides getBuildWorkResult() which causes a dataflow action to execute at the end of a build. If no flow provider is given, the docs say that the dataflow action will be executed at an undefined point:

:warning: If you’re not using a lifecycle event provider as an input to the dataflow action, then the exact timing when the action is executed is not defined and may change in the next version of Gradle.

Current implementation causes dataflow actions to be executed at the end of a build when no lifecycle event provider is used.

Context

We had previously implemented this as a regular task, but it is difficult to make the task always run for every build command. Adding the environment scan task as a prerequisite for all other tasks in every project eagerly creates all tasks during configuration phase and bloats the cache and slows configuration down. It is also incompatible with isolated projects.

mlopatkin commented 6 months ago

This feature request is in the backlog of the relevant team and is prioritized by them.


This makes a lot of sense. There are two possible ways to implement this: execute immediately (at configuration time if not running from the configuration cache), or postpone until the execution phase actually starts. For the latter, we may expose some information about the configuration, like build finished provider does for the build result.

alllex commented 5 months ago

With Isolated Projects, the boundary between the "configuration phase" and "execution phase" can become blurry. At some point, a task in a project can start executing before all projects have been configured because the isolation is guaranteed.

How would such a change affect build configuration analysis? More generally, do you expect to analyze each project in the build before it starts executing tasks or do you want to analyze "build level" information?

joshfriend commented 5 months ago

More generally, do you expect to analyze each project in the build before it starts executing tasks or do you want to analyze "build level" information?

We only want to analyze build level info. This has included info about how many commits behind the main branch a user is working, or running launchctl limit maxfiles to see if developers have applied the workarounds we have for macOS file descriptor limit issues. If we run these during settings or configuration phase, they are captured by configuration cache and we see a much higher rate of CC invalidation (mostly from the git info changing). The previous implementation of making a task to collect/send this info and having all other tasks depend on it just bloated the configuration cache by eagerly configuring all tasks and causing cross-project configuration.

FlowProviders seems to be the only current solution to these issues but it can only run actions at the very end of the build. If it had an option to run at the start of a build, that would be awesome! My request for an after configuration hook was simply to be able to avoid capturing irrelevant info in the configuration cache.

bamboo commented 3 months ago

Hi Josh,

The previous implementation of making a task to collect/send this info and having all other tasks depend on it just bloated the configuration cache by eagerly configuring all tasks and causing cross-project configuration.

Since Gradle 8.9, it should be possible to write a settings plugin using a combination of lazy APIs to achieve the desired result:

// settings.gradle.kts (or settings plugin)

// Register validation task once in the root project
gradle.rootProject {
  register("validateUserEnvironment") {
    doLast {
      require(System.getenv("ANSWER") == "42") {
        "Build is running in the wrong universe!"
      }
    }
  }
}

// Make every configured task in every project depend on the validation task
gradle.lifecycle.beforeProject {
  tasks.configureEach {
    if (name != "validateUserEnvironment") {
      dependsOn(":validateUserEnvironment")
    }
  }
}

// Some toy tasks to test lazy configuration with
gradle.rootProject {
  tasks {
    register("a") {
      println("$path is being configured")
    }
    register("b") {
      println("$path is being configured")
    }
  }
}

With that setup, only the necessary tasks should be configured, with no cross-project configuration violations and, the configuration cache should be properly reused across invocations:

First run should store the configuration cache entry

➜ $ gradle a -Dorg.gradle.unsafe.isolated-projects=true
Isolated projects is an incubating feature.
Calculating task graph as no cached configuration is available for tasks: a
:a is being configured
> Task :validateUserEnvironment FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':validateUserEnvironment'.
> Build is running in the wrong universe!

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.

BUILD FAILED in 446ms
1 actionable task: 1 executed
Configuration cache entry stored.

Subsequent runs (valid or not) should properly reuse the configuration cache entry

➜ $ ANSWER=42 gradle a -Dorg.gradle.unsafe.isolated-projects=true
Isolated projects is an incubating feature.
Reusing configuration cache.

BUILD SUCCESSFUL in 448ms
1 actionable task: 1 executed
Configuration cache entry reused.

➜ $ gradle a -Dorg.gradle.unsafe.isolated-projects=true
Isolated projects is an incubating feature.
Reusing configuration cache.
> Task :validateUserEnvironment FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':validateUserEnvironment'.
> Build is running in the wrong universe!

* Try:
> Run with --stacktrace option to get the stack trace.
> Run with --info or --debug option to get more log output.
> Run with --scan to get full insights.
> Get more help at https://help.gradle.org.

BUILD FAILED in 517ms
1 actionable task: 1 executed
Configuration cache entry reused.

Would a solution like that fit your particular scenario?

joshfriend commented 3 months ago

Would a solution like that fit your particular scenario? This approach seems to work though it is slightly clunky. The task registered in gradle.rootProject cannot be passed directly to the subsequent dependsOn calls inside of gradle.lifecycle.beforeProject so we have to reference the task by name.

Declaring this work as a task and making every other task depend on it is still somewhat of a hacky solution. What I really wanted to do was be able to run this immediately during initialization without capturing extra stuff in the configuration cache, but that isn't possible by any mechanism yet, so doing it as a task is currently the most feasible.

A FlowProviders for the start of execution would still be a more ideal solution (complicated of course by isolated projects blurring that line), or some API that allows async work to happen during initialization/configuration which is not able to change build setup and can therefore be excluded from configuration cache.

bamboo commented 3 months ago

A FlowProviders for the start of execution would still be a more ideal solution

I hear you.

Would you need the validation to happen before any task starts executing or would it be ok for the validation action to run concurrently with tasks?

joshfriend commented 3 months ago

I'd prefer to do these checks before any tasks execute since they detect build environment issues. Isolated projects would hopefully make task execution begin sooner in the timeline, but I am still concerned about possible performance implications of making every task in the build depend on this validateUserEnvironment task. It worked out alright previously at the expense of eagerly creating tasks during configuration, but this could change in the future.

alllex commented 3 months ago

From the previous comments:

We only want to analyze build level info. If we run these during settings or configuration phase, they are captured by configuration cache and we see a much higher rate of CC invalidation

@joshfriend, your intention seems to be to fail as early as possible in case of the validation problems. There is nothing specific you want to check that requires the validation logic to be executed just before the tasks, is that correct?

If this is the case, then the use case is to run some work as early as possible during the configuration time without affecting the CC inputs. You could use an eagerly evaluated ValueSource for this. The value source can be evaluated even during settings and its execution logic, by definition, would not contribute to the CC inputs. The value source also supports running external processes out-of-the-box.

By eagerly evaluated I mean something like this:

// settings.gradle.kts
val validatingValueSource = settings.providers.of(MyValidatingValueSource::class) {}
val validationResult: Boolean = validatingValueSource.get() // <--- force the execution of the validation
// the boolean result becomes a CC input, but it will always be `true` for good builds, not affecting the CC hit rate
if (!validationResult) { abortTheBuildSomehow() }
joshfriend commented 3 months ago

There is nothing specific you want to check that requires the validation logic to be executed just before the tasks, is that correct?

Yes. The ValueSource suggestion is neat, let me try that!