micronaut-projects / micronaut-gradle-plugin

A Gradle Plugin for Micronaut
Apache License 2.0
65 stars 43 forks source link

Fixed cache misses caused by inspectRuntimeClasspath tasks. #918

Closed ribafish closed 5 months ago

ribafish commented 6 months ago

Fixed cache misses caused by inspectRuntimeClasspath tasks.

The issue was that when the runtimeClasspath input was defined just as @InputFiles, the files had bytewise differences because of timestamps. When using @Classpath annotation, the files contents are used for cache fingerpriting - one caveat is that with @Classpath, the order of files also matters, that's why I also sorted the files before adding them to the runtimeClasspath.

These issues were affecting the cacheability of all users of these plugins, including the micronaut-data project.

ribafish commented 5 months ago

Actually I think this solution doesn't work either. The initial javadoc comment says it all: the only thing this task cares about is the name of the files on classpath. It doesn't care about the content of the files.

Therefore, I think a better way to fix it would be to make the getRuntimeClasspath method @Internal, then have an @Input property which is actually computed from the list of files.

In order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time. We can't simply reference the files themselves, as this would have the same problem as explained in the PR description because file metadata (namely timestamp) will change. If we were to have it be a String input with a list of filenames we'd need to resolve the dependencies in configuration time.

As such, I see only two solutions:

  1. Have the input defined as @Classpath - current solution
  2. Make the input accept a list of (unresolved) dependencies and search those for snakeyaml. We could get those by Configuration.getAllDependencies(), which doesn't resolve the configuration, with the downside of not getting transitive dependencies.
melix commented 5 months ago

n order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time.

Yes, but we should change that to happen at configuration time, using a new @Input which uses the configuration resolved component api (which should be compatible with config cache too).

ribafish commented 5 months ago

n order to get the names of files on classpath, that classpath and its dependencies need to be resolved. With the current implementation that happens in execution time.

Yes, but we should change that to happen at configuration time, using a new @Input which uses the configuration resolved component api (which should be compatible with config cache too).

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

melix commented 5 months ago

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

Correctness. With runtime classpath normalization, the task will be out of date if any if the files change. We don't care that they change, we only need the file names.

ribafish commented 5 months ago

Why would we want to have dependency resolution happen in configuration time? What are we trying to solve with this suggestion?

Correctness. With runtime classpath normalization, the task will be out of date if any if the files change. We don't care that they change, we only need the file names.

That wouldn't make the build incorrect though.

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed. Having the input remain a classpath will only re-execute the task when there's a change in the classpath, which is comparatively quite rare.

melix commented 5 months ago

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed

That is not correct. It would be done only if the task is on the task graph, thanks, hopefully, to lazy task configuration. If a task is on the task graph, then it needs to be executed hence configured. If the task is configured and not executed, then it's an error which needs fixing.

ribafish commented 5 months ago

By forcing configuration time dependency resolution you will do that on every build of every project that uses the plugin, even if the task is not executed

That is not correct. It would be done only if the task is on the task graph, thanks, hopefully, to lazy task configuration. If a task is on the task graph, then it needs to be executed hence configured. If the task is configured and not executed, then it's an error which needs fixing.

I worded that badly - dependency resolution in configuration time will run every time the task is on the task graph, even if it doesn't need to be executed due to UP-TO-DATE checks or being retrieved FROM-CACHE. The task itself is attached to all Test tasks, so anytime any test is executed, the task will be on the task graph. Most of the time, the classpath wouldn't be changed, but the classpath would still be resolved in config time, even if the task execution could be/ is avoided.

melix commented 5 months ago

I don't see how this changes anything, but I may be missing something. Today we have:

    @InputFiles
    @PathSensitive(PathSensitivity.NAME_ONLY)
    public abstract ConfigurableFileCollection getRuntimeClasspath();

In order to know if the task is up-to-date, or cached, getRuntimeClasspath() has to be called, because it's an input and because it references the runtimeClasspath configuration, it will be resolved at configuration time. This will not change if you use @Classpath.

Using runtime classpath snapshotting will not prevent the task from being configured if it's in the task graph.

ribafish commented 5 months ago

@melix, I updated the PR as requested.

On a side-note, I was thinking that perhaps the best solution for this task would be to make it not cacheable. The execution is super fast since it just goes over the filenames.

For reference, this is a real execution on ge.micronaut.io - a lot of the time (relatively speaking) is spent dealing with remote cache and so all the tasks together take 14s.

melix commented 5 months ago

Interesting numbers, indeed since most of the time will be spent locally dealing with dependency resolution, it's not worth caching.