Closed mhdawson closed 4 years ago
The downside of course would be having to build 2 versions although that might be just be building the larger one and then stripping out components for the Deployment kit.
But at the moment that would just be npm (as everything else is baked into the node binary)? On Windows we do ship just node.exe
, e.g., https://nodejs.org/dist/v10.1.0/win-x64/node.exe, but you need to go through All download options to find it.
Binary size was also one of the concerns with building full-icu: https://github.com/nodejs/node/issues/19214
What about Yarn on the official Docker images? Would that be a candidate for removal in a deployment image too? (If there are any considerations to split the images accordingly)
Why torture people with too many options to choose from? Bandwidth is cheap and disk space even cheaper.
@bnoordhuis we still have people raising the concern that our size is creeping upwards. If our users agree that size does not matter then I'd agree with you. I think it was one of the recent user-feedback sessions where this came up. We could consider something like a survey if we need more concrete data on this front.
@bnoordhuis the context I have is that disk space isn't super cheap for one of the newer and stronger use cases for Node.js at present: Serverless. The increasing binary size (largely due to budling npm, as far as I understand) lengthens the startup time of cold starts.
@bnb I don't understand your comment. If by "serverless" you mean something like AWS Lambda, then the size of our binaries is irrelevant because that's not what AWS uses.
we still have people raising the concern that our size is creeping upwards
@mhdawson Okay, but what people and why is that their concern?
@boneskull can you chime in here in terms of concerns from the tooling side? I think we heard some concern about the size of the downloads/extracted size on disk from the tooling user group.
It crosses over a bit into “embedding Node.js” territory, but the issue was:
node
executable, and including npm
, is not needed (for the Titanium use case; presumably since all dependencies are bundled anyway)So if you are distributing Node because your app is built with it, and you don’t expect end users to have Node installed, and you want to use native modules in your app... downloading and installing extra cruft just for the executable is evidently a poor experience for end users
downloading and installing extra cruft just for the executable is evidently a poor experience for end users
I have no problem with a node+npm and node-only tarball, we already do that for Windows.
However, it sounds like @mhdawson is proposing a different kind of differentiation.
My description was
I think
today those would map to node-only tarball, and node+npm except that there might be things like the inspector we might only include the node+ tarball.
I was referring to this:
we could use some of the size advantage gained by removing development components from the deployment kit to offset additions on the diagnostics side (node-report, etc.)
That's differentiation, isn't it? I interpreted your comment as saying we bundle it with the deployment kit but not the dev kit.
The Diagnostic WG has consensus that node-report should be part of node core, so the plan is to move it into the node binary. So at that point, the specific example would no longer apply. Having said that the concept of including tools that are needed in production still applies, so I guess that is a bit different than node executable only. Possibly also an argument to move those things into the binary itself though.
Possibly also an argument to move those things into the binary itself though.
Agreed.
This sounds very similar to what some other languages do. For example, Java has the JDK for development and JRE for just the runtime. Similarly, .NET has the .NET Core SDK for building apps, and the .NET Core Runtime if you just want to run them. However, one of the main differences with those languages is that they're compiled, so there's a significant difference in disk space usage between the two (SDK contains the compilers, runtime doesn't). The difference would be less significant with Node.js.
While the difference in total size might be smaller, from what I understand even just removing npm would cut the size in half.
I just wanted to comment with an addition to this discussion:
I'd personally really love to see an image without including npm. There are a suite of use cases, some of which are being driven in the project already (see: https://github.com/nodejs/docker-node/issues/404).
It would be absolutely fantastic to hear from platform builders and engineers who have use cases for this. I do know of a few myself (Serverless, PaaS, IoT) but getting details from the people working on them – and any others that would benefit from this – will be invaluable to help drive this.
Disk space is cheap, but can be limited. For example, Linux modules for IoT often have (only) 4 GB of flash memory for everything. So keeping Node as small as possible is 'IoT friendly'. Let the IoT customer compose the Node stack as wanted. Don't force an unnecessarily large minimum image. Minimalism finds many uses, as demonstrated by Alpine Linux which surged ahead in a world replete with Linux alternatives, due mainly to minimal footprint.
And speaking of IoT, bundling npm in Node makes the Node upgrade much heavier. Suppose the long life IoT platform wants a Node upgrade for a security update or some other special reason, but does not otherwise need to update npm or Yarn? It would be nice if the IoT platform could update just the piece it wants, Node in this case, without being driven to a monolithic fork lift upgrade. That results in less to go wrong, with a greater success rate across what could be many thousands of devices being updated.
@mhdawson
While the difference in total size might be smaller, from what I understand even just removing npm would cut the size in half.
This seems to be a bit of an exaggeration. In 10.7.0, using the OSX bins, the npm tarball is 5MB, and the node tarball is 16MB.
Extracted, removing npm reduces the distro from 58M to 41M. (This is in raw data size, before accounting for filesystem blocksizes, which will make your du
numbers look quite different.)
That being said, I can see why you'd be interested in having a runtime-only release. But with my Node.js hat on, I'm not sure I understand why you want it to be in an official Node.js distribution? This feels like the kind of thing that OS package managers were made for, even minimal ones like opkg
?
I would also like to see npm unbundled from Node.js as it makes more sense to not preinstall it in specific cases and independently manage npm using solutions like nvm, n and others.
Also this can reduce the attack surface and other solutions like PHP and others do not come preinstalled with a dependency manager.
Also every time we change or install a Node.js version we have to upgrade npm as outdated versions are bound to them (one of the things which Ryan Dahl regrets - this is one reason why we have Deno now).
Slim / pure Node.js releases would be great.
Why torture people with too many options to choose from? Bandwidth is cheap and disk space even cheaper.
This is not real in all countries and cases. We have internal CI servers which use much bandwidth and in mosts cases we offload the npm i
part to the clients for manual compilation and push the compiled package to the target remote server using rsync as the servers just run the tests and actual app deployment (eCommerce software, CMS, ...).
But with my Node.js hat on, I'm not sure I understand why you want it to be in an official Node.js distribution?
@iarna My understanding is that there are a few use cases:
From a project perspective:
Totally happy to be told I'm wrong on any of these, just a collection of the reasoning I've uncovered while I've been trying to investigate 🤗
@bnb You didn't really address my question... I get why minimized distros are useful, what i don't get is why it needs to be maintained by the Node.js project?
(Edited to add: There are costs associated with adding more options, both from a maintenance perspective and from a UX perspective. I'm focused on this because the benefits need to out weigh those.)
$0.02 from a "serverless" dev (we deploy to the Node hosted by AWS Lambda)
I like the direction of this conversation but I'd like to drill in on the root concern: performance. The predicate for this discussion has been generally around 'less disk space utilized equals faster'. I'd like to advocate an empirical approach that uses a common metric we can all agree should be priority: coldstart.
Disk space cost for the runtime is not really a userland concern for managed runtimes. Sure, some projects re-bundle Node and shell out to in Lambda but those projects are deliberately swimming upstream already by packaging their own downstream.
A universal performance concern we all share however is the dreaded coldstart. Python and Node are very close in terms of coldstart performance. And the Go runtime once compiled kicks everyones ass.
I do not know if the Node runtime coldstart is impacted by disk (presumably somewhat) but some measurement of 'heres how fast we start cold with npm' vs 'here is how fast we start without npm' would be a very useful metric to gauge the tradeoff value of multiple runtime dists. If you can approach 20ms I'm here for it! If the difference is inconsequential I think you're creating more problems than you're solving (from a Lambda devs perspective… maybe legacy container deployments would appreciate that 10mb delta… tho idk how that would move the needle given we can now store terabytes for pennies).
Another consideration rife with politics I do not care to engage with: maybe we could work on making npm significantly thinner? Projects like qdd
demonstrate this is very doable. A recent peek under those covers showed me a lot of room for improvement.
@brianleroux I don't believe that image size makes any difference for cold start... although in a cluster w/ 100s to 1000s of images that extra disk space is obviously non trivial. This thread shows that people in the ecosystem are interested in smaller base images... but this is something that could be punted to the docker wg imho... cloud providers are already having to do a bunch of work to get distributions running, and I'm not convinced that us releasing a slimmed tarball significantly simplifies this (it would significantly increase the artifacts we release)
There is work being done to improve cold starts... take a look at --> https://github.com/nodejs/node/pull/21405 which landed in 10.6.0
We've also had discussions about how we could pre parse / build the AST for node applications... this would have a significant impact on cold start. It's feeling like ESModules will help with this, as we can dump the binary representation of the AST before the execution phase.
Also had a broken npm install due to the infrastructure issues which broke my boxen setup as npm was broken.
In general I would like to have the option to chose my package manager for projects or use none on server where the projects should just run and where we need a lower attack surface.
@iarna thanks for providing more accurate numbers.
One of my interests in this discussion is if we gain the flexibility to add more to the larger package if we have 2 packages versus one. This would let us provide more value for those who don't care about the download/install size while not affecting those who do care about the size as they would be using the binary-only version. For example, today it is a challenge to add to the size (for example with full ICU) as a single package needs to balance between the needs of different users.
In terms of OS package managers, I think Node.js would need a different structure to facilitate that as currently, I'm not sure that it is all that safe to assume that removing components from specific locations is going to work between releases.
what i don't get is why it needs to be maintained by the Node.js project?
@iarna gotcha, sorry I didn't answer that directly. IMO it's similar to why you'd want the deb and rpm distributions to be maintained by the project rather than, say, Canonical or NodeSource.
Having the package-manager-less images supported, built, and distributed by the project would hopefully ensure that there is less ecosystem wonkiness around the usage. If this task is put off to third parties, you can't guarantee that those third-parties will take responsibility for resolving issues they create when distributing and maintaining them.
I'm not trying to assert that this is the same issue, but we have already seen that happen with the default builds of Node.js when you apt-get install node
. This has broken the expectations of developers who use that package manager by default, which – IMO – is a poor developer experience.
By providing an official build, rather than having third-parties slice up the default binary, we can help ensure a consistent developer experience of the slim build.
Hopefully that answers your question? 😄
I added this to the release agenda to discuss this in our next meeting again.
@nodejs/tsc in our last release meeting we discussed if it would be possible to add extra releases that did not contain e.g. a package manager and to add a small script that would allow to install the package manager later on. The reason the scope is limited to the package manager is that it seems like it's the only thing that we could remove at the moment (There could be more and in that case that should be removed as well).
It would not influence any of the "regular" releases and would just be an addition to the currently existing ones. That way users could choose what they believe is best for them.
Since such a decision might have a bigger impact, it's something we would like the TSC to discuss about. This was originally on the TSC agenda before (https://github.com/nodejs/TSC/issues/571) but it was then delegated to us. As release team we believe it would be fine to do that as long as there's no extra overhead on our side doing that. It would probably just require our build tools to be updated and to add an install script for users to install a package manager. This would probably fall into the @nodejs/build area?
Depends on the use case for this is, but if the primary driver for this is footprint, then you could also strip
the debugging symbols out of the executable which would saves about 17% (xLinux Node12)
I'm not sure why people who want only node from the release package can't delete everything but the node executable from the release package. This is easy in a Dockerfile, for example. Is anything but the node executable needed for deployment?
WRT to comparison to other languages, and their runtime distributables, in some other languages, the runtime standard libs are shipped on disk as seperate files, but node bundles everything it needs to run into a single self-contained executable. Take out npm and the include files, and not much is left, and its easily seen that none of it is needed:
tar -tf ~/Downloads/node-v10.16.0-linux-x64.tar.xz|egrep -v 'npm|include'
node-v10.16.0-linux-x64/
node-v10.16.0-linux-x64/share/
node-v10.16.0-linux-x64/share/systemtap/
node-v10.16.0-linux-x64/share/systemtap/tapset/
node-v10.16.0-linux-x64/share/systemtap/tapset/node.stp
node-v10.16.0-linux-x64/share/doc/
node-v10.16.0-linux-x64/share/doc/node/
node-v10.16.0-linux-x64/share/doc/node/gdbinit
node-v10.16.0-linux-x64/share/doc/node/lldbinit
node-v10.16.0-linux-x64/share/doc/node/lldb_commands.py
node-v10.16.0-linux-x64/share/man/
node-v10.16.0-linux-x64/share/man/man1/
node-v10.16.0-linux-x64/share/man/man1/node.1
node-v10.16.0-linux-x64/LICENSE
node-v10.16.0-linux-x64/bin/
node-v10.16.0-linux-x64/bin/node
node-v10.16.0-linux-x64/bin/npx
node-v10.16.0-linux-x64/lib/
node-v10.16.0-linux-x64/lib/node_modules/
node-v10.16.0-linux-x64/CHANGELOG.md
node-v10.16.0-linux-x64/README.md
This seems achievable with minimal scripting on the consumers side, something like this but with curl in there somewhere?
w/core % tar -xf ~/Downloads/node-v10.16.0-linux-x64.tar.xz --wildcards '*/bin/node'
w/core % ./node-v10.16.0-linux-x64/bin/node -v
v10.16.0
If this involves dropping even more binaries into nodejs.org/download then I'm not a fan at all. I've been on a mission to try and reduce the burden created by our commitments on downloadables. It puts a strain mainly on our resources - people and infrastructure. And it adds more complications to our build toolchains. You don't like when things break down during releases now? This is going to exacerbate those problems.
But be careful what commitments you sign us up for because we're talking about multi-year commitments without having multi-year insight into resourcing availability (people, infra, $$ -- and let's not pretend that making the Foundation carry this in the form of financial burden is a future-proof solution, please). Maybe even review the kind of pain that can be caused by the prospect of losing just one major infra provider.
More != better.
A few suggestions:
Discussion in the TSC meeting today was that we should postpone considering this further until we have a list of things to either be removed from the smaller one or added to the larger one before we move this forward.
I am closing this, since there was no activity for a while and the TSC decided to postpone it until concrete things come up.
There has been some discussion around the download and on disk sizes of our binaries (for example about npm taking up a significant portion) which got me thinking about whether it would be good to have two types of binaries available.
Having these two might allow us to be more flexible with the Development kit version where download and on disk file size might not be as important.
At the same time we could use some of the size advantage gained by removing development components from the deployment kit to offset additions on the diagnostics side (node-report, etc.) that we feel should be part of what is always available in core.
As long as the 2 options and our assumptions about their usage fit real-world usage it could be a win. The downside of course would be having to build 2 versions although that might be just be building the larger one and then stripping out components for the Deployment kit.
Thought I'd open this to see what other people think?