gradle / gradle

Adaptable, fast automation for all
https://gradle.org
Apache License 2.0
16.75k stars 4.69k forks source link

Document which `Provider`s are resolved at store time when configuration cache is used #26363

Open hungvietnguyen opened 1 year ago

hungvietnguyen commented 1 year ago

Current Behavior

See below

Expected Behavior

See below

Context (optional)

No response

Steps to Reproduce

  1. Open the attached project: ProviderResolvedTooEarly.zip

It has the following build script:

abstract class CustomTask : DefaultTask() {

    @get:InputFiles
    abstract val inputFiles: ListProperty<File>

    @get:OutputFile
    abstract val outputFile: RegularFileProperty

    @TaskAction
    fun run() {
        check(inputFiles.get().isNotEmpty()) { "inputFiles is empty!" }
        outputFile.get().asFile.writeText("Hello")
    }
}

val fooTask by tasks.registering(CustomTask::class) {
    inputFiles.set(project.provider { listOf(project.file("fooInputFile.txt")) })
    outputFile.set(project.layout.buildDirectory.file("fooOutputFile.txt"))
}

val barTask by tasks.registering(CustomTask::class) {
    // Set fooTask's outputFile as barTask's inputFiles
    val regularFiles: Provider<List<File>> = fooTask.map { listOf(it.outputFile.asFile.get()) }
    inputFiles.set(regularFiles.map {
        println("############### Calling File::isFile ###############")
        it.filter(File::isFile)
    })

    outputFile.set(project.layout.buildDirectory.file("barOutputFile.txt"))
}
  1. Run rm -rf .gradle/configuration-cache && ./gradlew cleanFooTask && ./gradlew barTask -i --configuration-cache. The build will fail with:
    Execution failed for task ':barTask'.
    > inputFiles is empty!

The reason is that File::isFile is called during task graph creation, i.e. inputFiles are resolved before the task execution phase.

  1. Run the build again without configuration cache: ./gradlew cleanFooTask && ./gradlew barTask -i. The build will succeed (inputFiles are resolved during task execution phase).

Gradle version

8.3

Build scan URL (optional)

No response

Your Environment (optional)

No response

hungvietnguyen commented 1 year ago

Here are a couple of variations in constructing the Provider and their outcomes:

jbartok commented 1 year ago

Thank you for your interest in Gradle!

This issue needs a decision from the team responsible for that area. They have been informed. Response time may vary.


The 3rd option, the one that succeeds, IS the recommended way of doing it.


Configuration team please decide if this just needs better documentation, or if the error message needs improvement, or ...

hungvietnguyen commented 1 year ago

I think resolving Providers during task graph creation (whether with configuration cache or not) has several issues:

It would be nice if we had some clear documentation when a Provider will be resolved (not may be resolved) and why.

mlopatkin commented 1 year ago

Thank you for providing a valid report.

The issue is in the backlog of the relevant team and is prioritized by them.


val regularFiles: Provider<List> = fooTask.flatMap { it.outputFile.asFile }.map { listOf(it) } => FAIL

The option 2 should also work, the fact that asFile works differently is surprising. I filed #26437 to track this issue. Probably related to the optimized MappingProvider code path.

  • It will be incorrect if the Provider's resolved value is the outcome of another task's execution, as shown in the above example.

We're missing checks for some improper provider resolutions (e.g. a task @get:OutputFile val myOutput: RegularFileProperty can be queried at any time, but myOutput.map { it }.get() only works after the task completes, see #26340), so it is hard to figure out what should and what shouldn't work. I suspect that your sample 1 should fail with exception when calling get() because the outputs are not ready yet.

In most cases, calling Provider.get() inside of map should be replaced with a flatMap, it gives Gradle more chances to optimize the execution.

  • It slows down the build as it adds work to the task graph creation phase and potentially adds more configuration cache inputs.

It is a trade-off - saving a "pending" computation into the cache has a non-trivial cost too, and, of course evaluating it adds a penalty to each cache hit build.

  • It is unnecessary: Providers don't seem to need to be resolved to create the task graph (at least in this example). (I'd be interested in seeing examples showing that the task graph requires resolving Providers.)

There are plenty of cases, most often when the providers reference the configuration-only state, e.g. something from Project, so they cannot be evaluated when running the build "from the configuration cache". It is also possible to have a provider that changes the shape of a task graph.

  • It is inconsistent if Providers are sometimes resolved during task graph creation, sometimes not.

Off the top of my head, task output properties (but not task providers per se) and most "environmental" providers made by the ProviderFactory are considered irreducible, anything else is subject to the configuration-time evaluation. But this isn't well-documented at the moment.

It would be nice if we had some clear documentation when a Provider will be resolved (not may be resolved) and why.

Indeed, we definitely need to work on reducing confusion in this area. I'll rename this issue to track further documentation and/or API improvements in the area.

hungvietnguyen commented 1 year ago

Thanks for your detailed response!

Could you clarify whether my original report (using Provider.map) is still a bug in Gradle that needs to be fixed? If yes, should we keep the previous bug title and file a separate bug for improving documentation? (The new bug title hides the fact that this is still a bug.)

mlopatkin commented 3 months ago

Could you clarify whether my original report (using Provider.map) is still a bug in Gradle that needs to be fixed?

Sorry for the late response, it felt through cracks somehow. Indeed, it looks like a bug, as the same code works without CC. I'd say because dependencies propagate through map and because the value of mapped provider is not obtainable without running the task, CC should give such providers the same treatment. I filed #29639 to track that work.