svanderburg / node2nix

Generate Nix expressions to build NPM packages
MIT License
520 stars 98 forks source link

Make npm api requests in parallel #181

Open Mic92 opened 4 years ago

Mic92 commented 4 years ago

It seems that node2nix currently performs blocking requests:

diff --git a/lib/sources/NPMRegistrySource.js b/lib/sources/NPMRegistrySource.js
index 57d99be..81c627f 100644
--- a/lib/sources/NPMRegistrySource.js
+++ b/lib/sources/NPMRegistrySource.js
@@ -79,7 +79,9 @@ NPMRegistrySource.prototype.fetch = function(callback) {
               };
             }

+            console.log(`start downloading ${url}`)
             client.get(url, clientParams, function(err, data, raw, res) {
+                console.log(`finished downloading ${url}`)
                 if(err) {
                     callback(err);
                 } else if(data == undefined || data.versions === undefined) {
$ ./result/bin/node2nix
fetching local directory: ./. from .
start downloading https://registry.npmjs.org/optparse
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/optparse
http 200 https://registry.npmjs.org/optparse
finished downloading https://registry.npmjs.org/optparse
start downloading https://registry.npmjs.org/semver
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/semver
http 200 https://registry.npmjs.org/semver
finished downloading https://registry.npmjs.org/semver
start downloading https://registry.npmjs.org/npm-registry-client
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/npm-registry-client
http 200 https://registry.npmjs.org/npm-registry-client
finished downloading https://registry.npmjs.org/npm-registry-client
start downloading https://registry.npmjs.org/npmconf
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/npmconf
http 200 https://registry.npmjs.org/npmconf
finished downloading https://registry.npmjs.org/npmconf
start downloading https://registry.npmjs.org/tar
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/tar
http 200 https://registry.npmjs.org/tar
finished downloading https://registry.npmjs.org/tar
start downloading https://registry.npmjs.org/temp
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/temp
http 200 https://registry.npmjs.org/temp
finished downloading https://registry.npmjs.org/temp
start downloading https://registry.npmjs.org/fs.extra
info attempt registry request try #1 at 14.34.44
http request GET https://registry.npmjs.org/fs.extra
http 200 https://registry.npmjs.org/fs.extra
finished downloading https://registry.npmjs.org/fs.extra
start downloading https://registry.npmjs.org/findit

Maybe you can explain why this happens as you also seem to maintain the asynchronous abstraction behind it: https://github.com/svanderburg/slasp

Mic92 commented 4 years ago

Would agree to rewrite node2nix to use async/await builtins instead of slasp?

svanderburg commented 4 years ago

It is not necessarily blocking, but it currently only performs one requests at the time and was not made to do any downloads a parallel.

It could be done with promises/async/await, but there also other ways.

I have not done any experiments yet, but having the ability to have multiple downloads in parallel could be beneficial to the generation speed.

Mic92 commented 4 years ago

Is the api client that only performs one request at the time? The rest looks like it should be asynchronous.

raboof commented 4 years ago

nixpkgs generate.sh takes quite a while, and seems like it could potentially benefit from this. Do you have a particular direction in mind how you would like to see this implemented?

svanderburg commented 4 years ago

@raboof Although having parallel downloads might shave off a bit of processing time, I believe what we really want is the ability to partially regenerate the expression for only the package that needs to be added or changed.

I'm currently investigating options to make that possible.

Not this feature is not useful, but what I expect to happen is that if we make the regeneration process faster, people will add more packages eventually nullifying its effect.

With partial regenerations, we can have faster updates, and better commits on a per package basis.

Mic92 commented 4 years ago

@raboof Although having parallel downloads might shave off a bit of processing time, I believe what we really want is the ability to partially regenerate the expression for only the package that needs to be added or changed.

That would be really great. Also to enable backports

raboof commented 4 years ago

With partial regenerations, we can have faster updates, and better commits on a per package basis.

That sounds great. I didn't see an issue for this yet, so I filed #192.

oxalica commented 4 years ago

I tried to replace slasp.fromEach with a parallel version. (See repo)

It has speed up fetching about 2x in good network environment but makes the output non-deterministic, which are differences like A-0.2, B // { A-0.1 }, C vs A-0.1, B, C // { A-0.2 } while A is a common transparent dependency. This is related to the order of package traversal, since we currently choose the first version seen to be in outer layer and mark other version as override. Related code: https://github.com/svanderburg/node2nix/blob/bfd517c1774ddb4ae1a459b8d96f1bc653a72bd1/lib/Package.js#L148

Mic92 commented 4 years ago

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better? That should make it deterministic as well.

oxalica commented 4 years ago

@Mic92

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better?

I think the order affects not only the output layout, but also the version selection result. Consider that A requires C@^1.0 and B requires C@=1.5, assuming C has version 1.5 and 1.9. Resolving order A, B results in two package, C@1.9 for A and C@1.5 for B, while resolving order B, A results in a single package C@1.5 for both A and B.

The version resolution algorithm seems to be quite naive now.

Mic92 commented 4 years ago

@Mic92

Is there a way to force newest versions instead of the first version in the outer layer, assuming that newer versions are better?

I think the order affects not only the output layout, but also the version selection result. Consider that A requires C@^1.0 and B requires C@=1.5, assuming C has version 1.5 and 1.9. Resolving order A, B results in two package, C@1.9 for A and C@1.5 for B, while resolving order B, A results in a single package C@1.5 for both A and B.

The version resolution algorithm seems to be quite naive now.

Also there determinism could be achieved by going for the newest possible version for every constraint, but I am not sure if this is what the users want.

svanderburg commented 4 years ago

@Mic92 Always picking the newest version is not going to work -- the fact that this does not always happen is intentional. If we would choose to alter this behaviour, then compatibility with certain packages is most likely going to break.

Sadly, NPM always tries to "reuse" dependencies that fit within a certain version range in a node_modules/ folders that reside in the parent directories. For certain packages, this means that a version range will resolve to an older version than the latest. I don't like this behaviour either, but it is how NPM works and instrumental to ensure compatibility.

I had several commercial projects that would fail to deploy if I would not replicate this very odd and illogical behaviour. To discover this, I did quite a few intensive comparisons between node2nix and the vanilla NPM.

Furthermore, this is also NPM's means to break out of cyclic dependencies. If we always resolve to the latest version, then certain deployments will remain stuck in an infinite loop.

@oxalica Thanks for trying to optimize this. Although we have to give it some more thought, it is good to hear that there is a speed up possible.

LamprosPitsillos commented 9 months ago

Sorry to bump this old issue , but has there been any advancements in solving this issue ?