Open bercik opened 4 years ago
Long story short, those tasks don't cache very well, however, if you can get a local mirror of node and npm you should be able to get the same experience for nodeSetup
/npmSetup
.
And if you're using webpack or similar you can set up a task for that with inputs/outputs and cache it, which should help speed things up
Let's have a look to these tasks (I add nodeSetup
to the list):
nodeSetup
download Node.js. It uses for that a Gradle repository that automatically caches downloaded artifacts (exactly the same way as it does for Maven artifacts). So it is downloaded only once, but when the task is not up-to-date, it unpacks the archive (which is not very different in terms of performance as using a cache).npmSetup
uses npmInstall
to install a specific version of npm. See below.npmInstall
: as of version 5 (as far as I remember), it caches the downloaded packages to avoid downloading them each time they are installed. This task is also already cached, not by Gradle directly, but by npm.All those tasks are not directly cacheable by Gradle, but they rely on tools that use cache.
I know this have some limitations. For instance, if the build runs inside a container whose file system is not persistent, the cache will not be reused for next builds by default. This will be also the case for Gradle, but this can be fixed by changing its cache configuration. It would be necessary to do the same thing for also for npm and it is much simpler if all the cache is managed by Gradle. Once the Gradle cache is configured, everything can be cached.
We can assume (it would be interesting to test it though), that npmInstall
for instance would be even faster if it was fully cached by Gradle. Indeed, npm install
does not only extract the package contents. It also executes some post install scripts (for instance a node-gyp
compile). Caching the node_modules
directory would enable us to skip this whole process. But I think it could cause multiple issues:
node_modules
directory? I don't really know whether there are some conventions regarding this behavior. If it's not true, this means that each npmInstall
user would be in charge of configuring perfectly the task output to ensure the cached result is perfectly right.node_modules
directory does not contain exactly the same thing in the different platforms. For instance, node-gyp
packages contain some platform-specific binaries. We should probably add the platform as a cache key to ensure the node_modules
directory will not be reused across different platforms.I think we should make some experiments to ensure that make them cacheable really makes sense and does not create some unwanted side effects.
nodeSetup
and npmSetup
is going to be interesting to cache as they both access the same directories and create symbolic links (which are going to be replaced with the file they point to if we use the Gradle cache).
Luckily even if you run on an entire clean machine nodeSetup
and npmSetup
aren't that heavy, especially if you got distBaseUrl
and .npmrc
pointing to local mirrors or ones close to you.
Thanks for replying.
You are right, nodeSetup
and npmSetup
don't seem to heavy, but the npmInstall
does. We did the following in our project:
We've added custom npmCi task that caches node_modules folder:
task npmCi(type: NpmTask, dependsOn: [npmSetup]) {
if (System.env['CI'] != null && System.env['CI'] == "TRUE") {
args = ['ci']
} else {
args = ['install']
}
inputs.file(file("${project.projectDir}/frontend/package.json")).withPathSensitivity(PathSensitivity.RELATIVE)
inputs.file(file("${project.projectDir}/frontend/package-lock.json")).withPathSensitivity(PathSensitivity.RELATIVE)
outputs.dir "${project.projectDir}/frontend/node_modules"
outputs.cacheIf { true }
}
.gitlab-ci
we are running the task as follows:
./gradlew npmCi --build-cache
cache:
paths:
- .gradle
What do you think of that solution? It seems to be working, but I'm a little afraid about the possible problems you've mentioned above.
Hi,
I've come up with the following code and it works for me:
npmInstall {
inputs.files(project.files('package.json'))
outputs.dir("${project.buildDir}/node_modules/")
}
task buildUI(type: NpmTask, dependsOn: npmInstall) {
inputs.files(project.files('src'))
outputs.dir("${project.buildDir}/dist/")
args = ['run', 'build']
}
build.dependsOn buildUI
I'm running the gradle build
and it's running npm install
only when the package.json
is changed and it runs npm build
when the src
folder is changed.
@nukesz normally you don't need to configure the npmInstall
as you did. The default configuration already contains those inputs and outputs.
I also would advice you to add the node_modules
directory as an input of your buildUi
task since a change in your project library will probably change the build output. I would also add the package.json
file since you rely on its scripts when running run build
. You can also use the NpxTask
to avoid needing to declare the build
script in the package.json
file.
@nukesz, you are not talking about the same thing as @bercik. This issue is about caching, not just task execution avoidance. Task execution avoidance is the fact of skipping the task execution if the inputs did not change since last build and the outputs are still present and did not change. Caching is one step further. As Gradle knows which are the input and outputs of the task, it can save the output in a cache (which can be shared between multiple computers) and reuse the same output stored in the cache instead of running the task when the inputs are identical.
@bercik is your experiment successful? Does it work as expected? Do you use only local cache or do you share the cache between multiple hosts?
I would recommend merging npmInstall and build tasks. For example by using in package.json
:
"build": "npm ci && ng build
This gives a single task in Gradle. So if the result (the compiled webapp) is already available, the build server can jump over setting up the node_modules directory. Which can save a lot of time.
And npm install
should not be used, as it is not reproducible. Rather use npm ci
.
@remmeier if the compiled web app is already available then both npmInstall
(which can be configured to use npm ci
) and your build task would be skipped as your build task is already up-to-date
.
Though in your case there's a caveat with npm ci
, which is that it'll delete your node_modules
before running.
does npm ci
not do up-to-date checking? Actually in our project we make use of yarn, but apply the same techniques. This works well. Was hoping for NPM to get to the same level in that regard. With our single task yarn setup we manage to save around 30-60 second with every build per frontend project (which adds up, in particular for multi-project builds).
(there is a second new npm-based project on my side, i probably have some more insights soon in that regard)
I did some experiments in another issue to know which one of Gradle up-to-date checking or npm install
up-to-date checking and it appeared that Gradle was much faster than npm install
(see here for Linux and here for Windows). Of course, it probably also depends on the project size (mainly the number of dependencies). But this was not really the result we were expecting.
I did not do this experiment with npm ci
nor with yarn
, that would be interesting to do it.
And it sounds like npm ci
always starts by removing the node_modules
directory. So it does not seem to do any up-to-date checking attempt.
Here is the first line that is printed by npm ci
npm WARN prepare removing existing node_modules/ before installation
I will have a look later and file a bug report if it really does not to up-to-date checking. I really like yarn & Gradle...
npmInstall and gradle cache restore will heavily depend on where the data is coming from. yarn is able to cache it locally, so it can be fast. The same for gradle. A close-by Gradle remote cache is also not so bad. But the public NPM registry should be avoided at all cost I think for anybody in need of r robust/fast builds. Your Gradle cache results look good for sure.
the one things really good at yarn is that yarn install
is super fast if all is up-to-date (around a second).
the one things really good at yarn is that
yarn install
is super fast if all is up-to-date (around a second).
I don't think realistically it can ever beat gradle doing up-to-date
checks on your compile webapp task unless you either have a massive app or no dependencies.
I did this experiment again using an Angular project with a multiple dependencies (902 packages at node_modules
root) on an Ubuntu 20.04 laptop.
Note that I used the version 3 of the plugin (which will be released soon) because I discovered an issue in the version 2.2.3 regarding the yarn
task outputs declaration (the task does not run when I remove the node_modules
directory, whereas it should).
Command | Command execution time | Gradle up-to-date check |
---|---|---|
yarn |
1 second | 1 second |
npm install |
8 seconds | 1 second |
npm ci |
40 seconds | 1 second |
We can see that yarn
's up-to-date checking is fast, but not faster that Gradle. For npm install
and npm ci
, Gradle is clearly faster.
I could investigate the issue I encountered, and it appears that the up-to-date checking of the yarn task works as expected, I probably did an error in my experiment.
But I discovered that the node_modules
directory is declared twice as output, once as a directory and once as a file tree (unless you use the node_modules output filter). It is useless and slows down the Gradle's up-to-date checking. It takes 3 seconds instead of 1 (with the version 3 of the plugin).
I created the issue #96 and fixed it. It now also takes 1 second with the version 2.2.3 and the fix.
What about Yarn tasks ( YarnSetupTask, YarnInstallTask, YarnTask)? Can build cache be enabled for them?
Your own ad-hoc YarnTasks
can absolutely be cached, where I work all our npm run dist
takes nearly a minute each (and we got 4 in one project)
They cache extremely well, especially when paired with a remote build-cache
The problem is that yarnSetup
, npmSetup
and nodeSetup
reference absolute paths, busting the cache. Just don't do that. gradle shouldn't be installing those things anyway.
To fix caching, add this to your root build.gradle
:
plugins {
id "com.github.node-gradle.node" // specify version!
}
import com.github.gradle.node.task.NodeSetupTask
import com.github.gradle.node.yarn.task.YarnSetupTask
import com.github.gradle.node.npm.task.NpmSetupTask
afterEvaluate {
// disable all setup tasks because they bust the cache
allprojects {
[YarnSetupTask.class, NpmSetupTask.class, NodeSetupTask.class].forEach({ setupTask ->
tasks.withType(setupTask).configureEach({ task ->
task.configure {
onlyIf { false }
}
})
})
}
}
There's some other workarounds in https://github.com/gradle/gradle/issues/3525 as well, but with symbolic links being resolved to their file instead of being stored I'm not sure we ever could support the build cache in those tasks.
So if you're using download = false
you'll get the same effect as the snippet above (but lose out on perfectly reproducible builds)
Bumping the thread, since the question I've run into seems to be related.
On Gradle 7.4, npmSetup
task shows the following deprecation warning:
Cannot access output property 'npmDir' of task ':npmSetup' (see --info log for details). Accessing unreadable inputs or outputs has been deprecated. This will fail with an error in Gradle 8.0. Declare the task as untracked
by using Task.doNotTrackState(). Consult the upgrading guide for further information: https://docs.gradle.org/7.4/userguide/upgrading_version_7.html#declare_unreadable_input_output
This is shown only on Windows - on Linux, even in WSL, there's no problem. I guess the source of the error is the symbolic links in bin
, since they can't be opened by the ordinary Windows programs too. Does this mean that npmSetup
on Windows will always be effectively uncacheable?
@Cerber-Ursi I'm unable to reproduce that issue using the example project, with or without download enabled, with or without a npmVersion
set
https://github.com/node-gradle/gradle-node-plugin/blob/main/examples/simple-node/npm/build.gradle
Can you open a separate issue and provide a standalone project that reproduces the issue?
in my experience since my last update, it's better to never cache npmInstall
and npmSetup
. they are too large for gradle to cache efficiently anyway. however, the yarn-state or yarn-integrity files or similar should be specified as outputs, and if you want to be extra disciplined, node_modules must be set to read only. with yarn, use yarn 2+ and plug'n'play if you want caching.
Interesting. I've tried to do this again and wasn't able to reproduce the original problem either, including in the real project where it was initially noticed. Will try again later, not sure what exactly has changed in between.
EDIT: I made a mistake, the information below is incorrect.
~I tried to enable caching by specifying outputs.cacheIf { true }
~
tasks.npmInstall {
outputs.cacheIf { true }
}
~I think it's relevant that I also have a custom NpmTask~
val npmRunBuild by tasks.registering(NpmTask::class) {
npmCommand.set(listOf("run", "build"))
inputs.files("package.json", "webpack.config.js")
.withPathSensitivity(RELATIVE)
inputs.dir("src/main")
.withPathSensitivity(RELATIVE)
inputs.dir(layout.projectDirectory.dir("node_modules"))
.withPathSensitivity(RELATIVE)
outputs.dir("dist")
}
~If I enable additional build cache logging (add org.gradle.caching.debug=true
to gradle.properties
) and run ./gradlew clean && ./gradlew assemble
I can see this warning on the npmInstall task~
> Task :npmInstall
[...]
Non-cacheable because Gradle does not know how file 'node_modules/.package-lock.json' was created (output property 'nodeModulesPackageLock'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]
~Specifying the package-lock file as an output seems to help:~
tasks.npmInstall {
outputs.file(layout.projectDirectory.file("node_modules/.package-lock.json"))
.withPropertyName("nodeModulesPackageLock")
outputs.cacheIf { true }
}
~I can then run~
~1. ./gradlew assemble > assemble1.log
~
~2. ./gradlew clean -q
~
~3. ./gradlew assemble > assemble2.log
~
~In assemble2.log
I can see that npmInstall is cached.~
> Task :npmInstall FROM-CACHE
tasks.npmInstall {
outputs.file(layout.projectDirectory.file("node_modules/.package-lock.json"))
.withPropertyName("nodeModulesPackageLock")
outputs.cacheIf { true }
}
I don't think this does what you think it does. Unless npmInstall
contains node_modules/
as an output directory...
I don't think this does what you think it does.
You're right! It wasn't caching as I expected, I misunderstood what was going on.
Hi 👋
I've add an example test of the behaviour I'd like to see in https://github.com/node-gradle/gradle-node-plugin/pull/301
Currently I see warning messages from Gradle that explain why npmInstall
can't be cached:
Non-cacheable because Gradle does not know how file 'package-lock.json' was created (output property 'packageLockFileAsOutput'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]
Non-cacheable because Gradle does not know how file 'node_modules/.package-lock.json' was created (output property 'nodeModulesPackageLock'). Task output caching requires exclusive access to output paths to guarantee correctness (i.e. multiple tasks are not allowed to produce output in the same location). [OVERLAPPING_OUTPUTS]
From a quick look removing these properties as outputs should work
packageLockFileAsOutput
conflicts with the input property, packageLockFileAsInput
nodeModulesPackageLock
should be tracked by the getNodeModulesDirectory
outputDo these changes seem reasonable? I'm not familiar with NPM development so I would appreciate any insights.
Conceptually you've got the right idea @adam-enko, but there's a few things that makes npm install
cache extremely badly or outright broken
Let's start with the easy one, Gradle currently breaks symlink and this makes node_modules
unhappy, this is currently being worked on and might get resolved by the linked PR in https://github.com/gradle/gradle/issues/3525
Next one is harder because I don't have the data any longer and never thought to create benchmarks, but when your node_modules
becomes "moderately large" you may find that Gradle spends minutes doing the up-to-date check
This is especially funny if it turns out to be out-of-date and you then have to wait for npm install
to do the same up-to-date check
These might be fine locally, but if you have a remote build cache then publishing a 1-2GB node_modules
is likely to cause issues
And even at 100-200MB, the cost of pulling it from the cache will likely be bad, especially if you have a npm mirror close to you
And with a remote build cache, what happens if you pull a node_modules
using native components built for the wrong platform?
Adding the platform to the cache key by default would help this, but unless your node_modules
has no native components then I suspect caching it won't make much sense for the above reasons
So to list the problems I remember:
npm install
already does this check and is pretty fast at this pointnode_modules
can easily become largenode_modules
, this used to upset me but I can't find any modern tools doing this so this might not be relevant any longernode_modules
platform dependent, this used to upset me because a lot of these didn't build on FreeBSDnpm install
on a moderately up-to-date node_modules
can be very quickWith these in mind, most people who provided feedback (and most people polled) preferred the solution where Gradle mostly leaves everything to npm
and just keeps track of whether it has run successfully with the current package.json
(and lockfiles) and only expecting node_modules/.package-lock.json
as output
See the fastNpmInstall
for more information https://github.com/node-gradle/gradle-node-plugin/blob/16ecc0fa492b65dfa46a559f432ade45ce500b8e/src/main/kotlin/com/github/gradle/node/NodeExtension.kt#L127-L134
So to wrap this up, do you really want to cache node_modules
or are you ok with recreating it initially and having it be reported up-to-date like other tasks?
i.e. do you want it cached or are you interested in fast builds that work well with the remote cache but with the downside that your CI needs to do an npm install
?
This doesn't mean there's no use for this, it's just that all evidence I know of suggests that people think they want this but they really might want something else
There might be a legitimate use-case for this, or you might want to look into your build and consider the setup If you're an open-source project under a permissive license (like the MIT, BSD, Apache) I'll happily take a look at setup, if you're GPL I'll grudgingly take a look at the setup, if you're AGPL or commercial feel free to send me an email and we can discuss options
Thanks for the detailed response @deepy! It's a lot more clear to me now.
I'm working on Dokka - which uses node-gradle in one subproject https://github.com/Kotlin/dokka/blob/e502b2c1b1aa168bb3cd10d4d5f3ab883daffabd/dokka-subprojects/plugin-base-frontend/build.gradle.kts
The main issue is that npmInstall
will run even though there haven't been any changes, which can take 1+ minute on CI even if there are no changes https://ge.jetbrains.com/s/7z6wysb7gfgf6/timeline#xhxfjcjtknpfe
I have made some changes in https://github.com/Kotlin/dokka/pull/3479 that I expect will help.
Regarding the caching of NpmInstallTask
, I can think of two options:
NpmInstallTask
should be annotated with @DisableCachingByDefault
, and update the KDoc to add the reasons you explained so clearly.
Perhaps go even further: the risk of having platform-dependent files in node_modules is too likely, and the resulting issues too subtle, and has a high risk of poor performance, and NpmInstallTask
should be marked with doNotTrackState()
?
Take my proposal PR https://github.com/node-gradle/gradle-node-plugin/pull/301 further, so that if someone wants to make NpmInstallTask
cacheable they can.
There's a risk that someone, like myself(!), might enable caching without fully understanding the risks and costs.
Has @LocalState
been considered for node_modules/
? It looks like a good fit.
https://docs.gradle.org/8.5/userguide/custom_tasks.html#sec:storing_incremental_task_state
However, if the state files are non-relocatable, then they can’t be shared via the build cache. Indeed, when the task is loaded from cache, any such state files must be cleaned up to prevent stale state from confusing the tool during the next execution. Gradle can ensure such stale files are removed if they are declared via
task.localState.register()
or if a property is marked with the@LocalState
annotation.
@adam-enko That's an interesting proposal, but is not the directory too big for that? IIRC the default artifact size limit for caching is 100MB.
Haven't had the time to look into this properly yet, but on Windows I get a 218MB node_modules
from the plugin-base-frontend
which I guess isn't too bad
But running npm install
locally with no caches whatsoever takes 15 seconds on my machine, which is desktop equipped with an ok Ryzen CPU, a NVMe SSD, a gigabit connection, and way too much memory so not exactly a very useful comparison
Though I'm curious on where the bottleneck is, because you might see even worse performance with the remote cache
Good news is we can easily test that by creating a snapshot and comparing the builds Do you have a maven repository you could use for the snapshot? If not I can probably create a public one on my private nexus instance
But before that, can you add the --production
flag to npmInstall
or set the environment variable NODE_ENV
to production
and do a comparison?
That's apparently --omit=dev
now
Currently npmInstall and npmSetup tasks are not cacheable by gradle build cache, thus when running CI pipelines on different machines those tasks needs to fully run every time which consumes a lot of time.
Please consider making those tasks cacheable. Instructions can be found here: https://docs.gradle.org/current/userguide/build_cache.html#enable_caching_of_non_cacheable_tasks