nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
106.43k stars 29.01k forks source link

Use only one copy of llhttp ? #44000

Closed kapouer closed 12 months ago

kapouer commented 2 years ago

What is the problem this feature will solve?

Versions of Node using Undici (16, 18) are actually using two different copies of llhttp

What is the feature you are proposing to solve the problem?

Would it make sense to use llhttp wasm that is bundled in Undici, instead of recompiling it ?

What alternatives have you considered?

n/a it's a refactoring.

aduh95 commented 2 years ago

Or could the bundled Undici use the native one? @nodejs/undici

mcollina commented 2 years ago

The architecture and integration of the two is radically different.

When switching to the wasm one we actually got an improvement in performance.

I think it might be possible to reach a stage where this is possible, but it's probably far from now given we are focusing on fetch. However, things change if people would like to work on this.

kasicka commented 2 years ago

@mcollina This is a problem for us as distro package maintainers as well. Packaging precompiled code is a no-no for us. Is there a way this could be compiled from source during build? How complicated would it be to use non-wasm version given we are okay with worse performance? Or would it be plausible to have an option to have node without undici? IIRC it's used only for fetch which is experimental for now.

@khardix @mhdawson @sgallagher

kapouer commented 2 years ago

@kasicka you should have a look at how debian is dealing with it. node-undici package is building the wasm files from source, there, and nodejs package embeds the result.

mcollina commented 2 years ago

@nodejs/tsc we have a decision to make here. One of the reasons we stopped using the native http_parser was due to the use of process.bindings() for retrieving it (https://github.com/nodejs/undici/issues/22#issuecomment-650584032) Assuming we don't want to bring process.binding() usage back, we'd need to create some custom wrapping code for undici that replaces that part. Are we all onboard?

The second question is who is willing to do the work. Are there any volunteers?

mhdawson commented 2 years ago

I have been thinking about how we might move towards a standardized approach for integrating WASM which would address some of the concerns on the distro side. Some aspects of that would include:

1) Making it easy to identify/find all WASM blobs 2) Having good process/es/documentation for how the WASM blobs are generated in a repeatable way. I think this is also good for the project ignoring the distros in terms of supply chain security. 3) Making it easy for a downstream project to follow 2) and then replace the blogs from 1) with versions that they regenerate.

@kapouer when you say the nodejs package embeds the result, is it replacing what is in the Node.js upstream by default or some other process?

mcollina commented 2 years ago

FWIW building the wasm needed by undici is completely automated (otherwise is very hard to do it by hand).

kapouer commented 2 years ago

@mhdawson currently the nodejs debian package replaces deps/undici/undici.js during build by the one built by the node-undici debian package using a slightly patched version of undici's build/wasm.js. However, that's suboptimal for us, since rebuilding nodejs each time undici is updated in debian is not a good idea (especially for security fixes). It is not a big problem though, as it's rather easy to patch nodejs so that it requires the external undici-fetch.js file instead of bundling it.

mhdawson commented 2 years ago

@kapouer thanks for the details.

mhdawson commented 2 years ago

as it's rather easy to patch nodejs so that it requires the external undici-fetch.js file instead of bundling it.

I'm thinking that as there are more instances like this, it may become more of an overhead. I assume having a configure time option would make it a bit easier.

Thinking out loud a standard approach might be something like:

1) Any WASM/generated JavaScript that is not created directly as part of the Node.js build should be retrieved through a common accessor method. 2) For community builds the WASM/generated JavaScript that is required will be bundled into Node.js. We could probably come up with some infra where adding another instance would be as easy as, point to file in deps, give it a name, its bundled in and accessible through the name given using the accessor method. 3) In each case for 1) there should be a configure time option which allows the bundled require to be replaced with a require for an external file and for the path(s) for where to look for those files. The configuration time options might even be able to be autogenerated based on the name used in 2)

If the same approach was used in undichi (or at least the same configuration option name) for example, then the option could be simply passed on and the same external file could be used by Node.js core and undichi (although we would still have 2 copies)

I'm thinking of this kind of the along the same lines as how we have configure time options to allow components like openssl, cares, etc. to either be bundled in to the node.js binary or linked against as a shared library.

We might even be able to use the same approach as for shared libraries by getting the WASM/generated JavaScript from a symbol that is either present in the binary itself or within a shared library but that does seem a bit like putting a square peg in a round hole.

GeoffreyBooth commented 2 years ago

Why is it necessary to use external versions? Could Debian’s patch be integrated into Uncici’s version somehow?

We include a wasm binary for the https://github.com/nodejs/cjs-module-lexer dependency that’s also bundled in Node. I thought that this binary was good to run anywhere without needing recompilation. It feels like this is the model we should be shooting for, that Node’s binaries/scripts can run anywhere, rather than allowing more ways to run custom versions.

kapouer commented 2 years ago

@GeoffreyBooth there two use cases (that ideally are selected by ./configure flags, or worse, by local patches):

  1. Node authors distribute a binary bundling many other libraries as much as possible,
  2. Distributions redistribute Node linked with shared (system-installed) libraries as much as possible.

Whatever the module is (cjs-module-lexer, undici, acorn, openssl, c-ares, uv, etc...) if it needs to be compiled to be used, then it can be treated like a library with respect to those two cases.

In any case, open-source distributions need to recompile each part at some point, during Node build, or during the externalized dependency build. Also case 2 hates bundling an external dependency that is built and system-installed - security fixes then require a rebuild that could have been avoided by loading dynamically (like shared libraries do).

khardix commented 2 years ago

It feels like this is the model we should be shooting for, that Node’s binaries/scripts can run anywhere, rather than allowing more ways to run custom versions.

That is commendable, as long as you trust the person making the binaries. In Fedora (and derivatives), there is a strict policy of not shipping pre-built anything. One of the reasons why we do this is that there is no guarantee that the compiled binary corresponds to the source in git, and does not include i.e. added crypto-miner or other kind of supply chain attack. Rather that take that risk, we want to rebuild everything from source that we can review, using tools we also built ourselves (so no externally prepared SDKs or similar shortcuts either).

That's at least the theory; in practice, corners are sometimes cut and exceptions granted, but we would very much have as little of these as possible.

kapouer commented 2 years ago

I second @khardix : it's "open-source distribution" as in: everything is built from open source code.

mhdawson commented 2 years ago

@kapouer, @khardix, @kasicka as consumers of what I've proposed in https://github.com/nodejs/node/issues/44000#issuecomment-1204434268 what are your thoughts?

kapouer commented 2 years ago

Perfect !

mhdawson commented 2 years ago

Confirmed with @kasicka and @khardix that the proposal above is reasonable from their perspective as well. Will look at experiementing to see how it might be implemented.

github-actions[bot] commented 1 year ago

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] commented 1 year ago

There has been no activity on this feature request for 5 months and it is unlikely to be implemented. It will be closed 6 months after the last non-automated comment.

For more information on how the project manages feature requests, please consult the feature request management document.

github-actions[bot] commented 12 months ago

There has been no activity on this feature request and it is being closed. If you feel closing this issue is not the right thing to do, please leave a comment.

For more information on how the project manages feature requests, please consult the feature request management document.