paketo-buildpacks / npm-install

A Cloud Native Buildpack for npm
Apache License 2.0
10 stars 17 forks source link

Symlink node_modules/.cache to tempdir #661

Closed c0d1ngm0nk3y closed 1 month ago

c0d1ngm0nk3y commented 7 months ago

Fixes https://github.com/paketo-buildpacks/builder-jammy-base/issues/375

Summary

The folder node_modules/.cache is written during runtime (e.g. by eslint) this does not work

So this pr always creates the .cache folder as a symlink to the tempdir and the helper will create the needed dir for the link to be valid.

Use Cases

As brought up in https://github.com/paketo-buildpacks/builder-jammy-base/issues/375 already a very simple react app will not be able to be built with buildpacks since it contains eslint.

Details

Note: The pr looks larger than it actually is. Most of the changes come from the integration folder where a simple react was added.

After creating the mode_modules, the .cache folder is removed and a symlink to /tmp/node_modules_cache is added instead. This makes sure that the run user has the permission to add files in that directory. Since the build and the run user have different uids per default, this was not the case before.

In addition /tmp/node_modules_cache has to be created at launch time since the symlink would be broken otherwise. For this, the existing helper setup_symlinks also creates this folder.

Checklist

c0d1ngm0nk3y commented 7 months ago

@paketo-buildpacks/nodejs-maintainers Before I polish the pr (e.g. adding tests), could someone have a look if this is a valid way to go? At least it works :)

c0d1ngm0nk3y commented 7 months ago

@paketo-buildpacks/nodejs-maintainers Could someone have a look?

c0d1ngm0nk3y commented 4 months ago

@paketo-buildpacks/nodejs-maintainers Could you provide some feedback? One specific question: Why are ALL integration tests run with this ubi builder from paketocommunity? This forces every test to use the extension, right? Doesn't feel right that there is no test without extension anymore tbh.

mhdawson commented 4 months ago

This forces every test to use the extension, right? Doesn't feel right that there is no test without extension anymore tbh.

I don't think this is the case, you can see that the integration tests are run twice. Once with jammy and once with ubi. Those with ubi use the extenstion and those with jammy don't

@github-actions Test Pull Request / Integration Tests with Builders (paketocommunity/builder-ubi-buildpackless-base:latest) (pull_request) Successful in 13m Details @github-actions Test Pull Request / Integration Tests with Builders (paketobuildpacks/builder-jammy-buildpackless-full) (pull_request) Successful in 9m Details

c0d1ngm0nk3y commented 4 months ago

I don't think this is the case, you can see that the integration tests are run twice. Once with jammy and once with ubi.

From my understanding, that is only partially correct. All tests have to use the extension in order to also run with the ubi builder, right? Like here

Tbh, I do not really now what this ubi is about, so I might be wrong with that.

The second point, is it really worth/needed to run all tests with ubi?

pacostas commented 4 months ago

The ubi extension is added conditionally https://github.com/paketo-buildpacks/npm-install/blob/c2719baacdacdaaab330953d477e61f67b8b0aa4/integration/init_test.go#L90-L94 , so in case the builder is not the ubi one, the extension falls back to an empty string and as a result is not included during the build process. It could be refactored to be more clear by adding an array instead of having the names of the buildpacks or extensions. This way, someone could easily hypothesize that the array might have zero or more buildpacks or extensions, thus there might be no extensions or buildpacks during the build process.

mhdawson commented 2 months ago

I bit of info I think on the cache option - https://docs.npmjs.com/cli/v10/using-npm/config#cache, not much.

Would be interesting to see if can be set in .npmrc

mhdawson commented 2 months ago

Ok so the cache option I mentioned in the last post seems to be unrelated.

There is a cache option for eslint which is --cache-location and I've not been able to find a corresponding environment variable.

Unfortunately some of the dependencies seem to set that directly:

cnb@8fa6e49119d5:/workspace$ grep -r "cache-location" *
node_modules/eslint/lib/options.js: * @property {string} cacheFile Path to the cache file. Deprecated: use --cache-location
node_modules/eslint/lib/options.js:                description: "Path to the cache file. Deprecated: use --cache-location"
node_modules/eslint/lib/options.js:                option: "cache-location",
node_modules/file-entry-cache/package.json:    "eslint": "eslint --cache --cache-location=node_modules/.cache/ 'cache.js' 'test/**/*.js' 'perf.js'",
node_modules/fill-range/package.json:    "lint": "eslint --cache --cache-location node_modules/.cache/.eslintcache --report-unused-disable-directives --ignore-path .gitignore .",
node_modules/flat-cache/package.json:    "eslint": "eslint --cache --cache-location=node_modules/.cache/ ./src/**/*.js ./test/**/*.js",
node_modules/picomatch/package.json:    "lint": "eslint --cache --cache-location node_modules/.cache/.eslintcache --report-unused-disable-directives --ignore-path .gitignore .",

After some googling, I also found this package which seems to be widely used for finding the right place to write cache files - https://www.npmjs.com/package/find-cache-dir. It seems to use node_modules/.cache by default and also has an environment variable that can be used to override but I don't think that helps us with the modules that specify the cache location as shown above.

mhdawson commented 2 months ago

I tried chaning the instances I showed above manually to /workspace, but none of those were the cause.

Looking through other packages in node_modules results in find a bunch more places in bable, webpack etc that default to using the node_modules/.cache as a default location.

Sooo, I've not convinced myself that the use of node_modules/.cache as a default is widespread enough that we should ensure it is writable by since its referenced in https://www.npmjs.com/package/find-cache-dir but also because it seems to be hard coded into packages like webpack and babel.

@c0d1ngm0nk3y my only remaining question is if there is a risk that by removing the cache we might slow down startup of some applications because they benefit from data in the cache created during the build phase? If that might be the case then we might want an environment variable that can be used to turn the new behaviour off?

c0d1ngm0nk3y commented 2 months ago

@c0d1ngm0nk3y my only remaining question is if there is a risk that by removing the cache we might slow down startup of some applications because they benefit from data in the cache created during the build phase? If that might be the case then we might want an environment variable that can be used to turn the new behaviour off?

I cannot rule this out completely. What would you call this environment variable? Something like BP_KEEP_BUILD_CACHE? What would be the default? The problem would be that it has to be either or. So either you freeze the cache at build time or you make it writable at runtime.

Update: I would probably lean to BP_KEEP_NODE_BUILD_CACHE and let the default be false. What I am not sure about is that this depends on the specific node_module, but this setting would be for all. WDYT?

mhdawson commented 2 months ago

I agree with letting the default be false as I hope there will be no issues but think having an escape hatch if there are is a good idead.

I'm good with BP_KEEP_NODE_BUILD_CACHE.

In terms of this depends on the specific node_module based on my googling/investigation node_modules/.cache is used but a number of modules and is as close to a standard place for putting module cache files into as exists so I'm not too worried about that unless that's not what you meant.

c0d1ngm0nk3y commented 1 month ago

...unless that's not what you meant...

Not quite. Let's have you have two modules a and b...

You could not use BP_KEEP_NODE_BUILD_CACHE since b would crash, so you would loose the speedup from a

BUT

So I will go on with your suggestion.

c0d1ngm0nk3y commented 1 month ago

@mhdawson Is this the kind of use case you had in mind?

https://github.com/paketo-buildpacks/node-run-script/issues/126

mhdawson commented 1 month ago

@c0d1ngm0nk3y that does look like an example where not maintaining the cache could lead to slowdowns.

mhdawson commented 1 month ago

Just to add on

You could not use BP_KEEP_NODE_BUILD_CACHE since b would crash, so you would loose the speedup from a

Agree with your points of it not being a blocking concern. Also regressing existing applications is not a concern as there cannot be any working ones (since a would crash/not work).

mhdawson commented 1 month ago

@c0d1ngm0nk3y I'm ready to land but it tells me that the brach is out of date with the base and is not giving me the option to rebase in the UI. Could you rebase and then I'll land once the tests complete?

Thanks for all of your work, explanation and tweaks along the way on this.