Open make-github-pseudonymous-again opened 2 years ago
Thanks for the suggestion! Have you checked if the npm registry already supports fetching multiple package manifests in a single request? I presume not, but it would be worth confirming. If they don't support them, it would be a great feature request for npm (or npm-registry-fetch).
Hi @raineorshine, I haven't searched for it.
Hi @zkat, @isaacs, and @wraithgar! Any idea who to contact to discuss a potential addition to npm's API?
Changes to the npm registry itself would have to go through support. The npm cli interacts with that and many other registries (i.e. github packages registry, private registries, azure artifacts). If that route were added to any of those registries the cli could then be updated to use it.
Thanks @wraithgar.
@raineorshine Shall I open a ticket?
Hi @raineorshine! I would like to open a npm support ticket to request this API addition we would need. Do you give me permission to explicitly mention in this ticket the reason for this request is to improve the running time of npm-check-updates
?
Yes, go for it!
Here is what I sent:
New API route for batched version update information fetching
Dear support employee,
A tool I use to keep my npm packages (direct) dependencies up to date is relying on many "parallel" API queries to fetch relevant information. Currently this process running time scales quite badly with the number of dependencies. There are several ways this could be improved, and some involve having a better way to query the npm registry for information. For instance:
- The tool currently makes one query per (direct) dependency. I think that being able to query the registry for a reasonable amount of packages in a single batch would be significantly more efficient.
- The tool currently uses
pacote.packument
which does not seem to allow to query for only a subset of the fields. I think that being able to specify which fields are needed in the response would help further reduce the resource consumption of these requests.- The tool currently does not make use of last-modified headers, and I am not sure how these could be used with
pacote.packument
.- The tool currently does not exploit the fact that our queries are very specific: we simply want to know whether there is a more recent version available than the one configured in the package.json while respecting optional semver constraints (^, etc).
The tool's name is npm-check-updates and a relevant issue was opened at https://github.com/raineorshine/npm-check-updates/issues/961.
For the last two points: the tool is currently unable to exploit pacote's caching because the caching strategy used has no usefulness in checking for up-to-date information: see https://github.com/raineorshine/npm-check-updates/pull/578
Kind Regards, x
Regarding the third point: could this be improved now without an API change?
Hi @raineorshine. So far I did not get any reply from NPM. However, I believe some improvement may done independently of their reply. Regarding the following remarks (copied from the message I sent to NPM):
- The tool currently does not make use of last-modified headers, and I am not sure how these could be used with
pacote.packument
.
and further
For the last two points: the tool is currently unable to exploit pacote's caching because the caching strategy used has no usefulness in checking for up-to-date information: see https://github.com/raineorshine/npm-check-updates/pull/578
Two questions:
pacote
caching strategy be mended to honor last-modified headers (or does it already honor them)?pacote
?
- Can
pacote
caching strategy be mended to honor last-modified headers (or does it already honor them)?
That's a great question. I don't know the answer off-hand, but I'd be curious to know.
- If not? Can we use something else than
pacote
?
Currently we only use pacote.packument
, so there is a small surface area. We could use a lower level npm fetcher directly. However, pacote
uses the npm config for things like proxies and registry authentication, which seems daunting to recreate.
I should mention that npm-check-updates
actually does batch fetch packages. I don't know where my head was at when I neglected to say that at the beginning 🥴. The default concurrency is 8, and can be changed with the concurrency
cli option.
However, as you know, it still does them in separate HTTP requests. Just thought it would be useful to note here.
So ... this issue made me do a little digging into pacote
/ npm-registry-fetch
/ make-fetch-happen
and it turns out that the npm cli is not leveraging http agents, even a little. make-fetch-happen
creates one but it only uses it once per call, and would only end up reusing a connection on things like socket timeouts and connections refused ... the one time it won't do any good.
I've brought this up internally and we'll probably look into it for the cli. Unfortunately for you even though pacote
accepts an agent
arg, make-fetch-happen
ignores and clobbers it https://github.com/npm/make-fetch-happen/blob/main/lib/remote.js#L44
This is definitely a solution here, allowing folks to pass in their own agent to actually utilize node's http agents properly. I predict that we'll also make the cli or pacote do this by default too, allowing for folks to still pass their own if they want.
I was mistaken! agent
is obeyed by make-fetch-happen
!!! https://github.com/npm/make-fetch-happen/blob/main/lib/agent.js#L44-L46
You should be able to generate your own keepalive agent and pass that in as agent
to pacote opts.
After a little more digging, this time at an actual computer, it looks like make-fetch-happen does reuse agents, storing them in the lru cache
https://github.com/npm/make-fetch-happen/blob/main/lib/agent.js#L57-L59
The agents are keyed by the opts passed in so if they're not changing, you're getting the same agent.
TLDR keepalive
is already being utilized for both http and https calls with pacote.
* Can `pacote` caching strategy be mended to honor last-modified headers (or does it already honor them)?
The caching in the npm toolset was overhauled last year and does honor this header. It's one of the headers that is explicitly stored in the cache: https://github.com/npm/make-fetch-happen/blob/dadba9a2d5fc718c9179d9fe13c436ac4d69303d/lib/cache/entry.js#L42
The caching semantics themselves are handled by https://github.com/kornelski/http-cache-semantics, which does properly use this header.
However, it seems that because your module does something extremely specialized, it can use a route that the npm cli can't for fetching packet versions. The npm cli has to fetch the full packument so that it knows ALL the versions, which is needed when calculating a full node_modules tree to satisfy semver ranges of all the packages in that tree. Because this module only ever cares about the latest version, you can use the npm registry route that does only that. It's much much less data to request across the network.
$ npm dist-tag ls npm-check-updates
latest: 12.4.0
next: 12.2.2-1
$ curl https://registry.npmjs.org/-/package/npm-check-updates/dist-tags
{"latest":"12.4.0","next":"12.2.2-1"}⏎
Caching on that route is less optimized, because it's returning metadata from the packument. However, the npm registry only ever tells caches to store packuments for 5 minutes anyways, because they are the data that changes when new versions are published. For all intents and purposes they are inherently out of date. The 5 minute value is mostly to prevent a firehose situation on the registry. In fact the npm cli informs the cache to fetch it from the registry if at all possible: https://github.com/npm/cli/blob/2db3eff44d7bb86b956207cc63d279806fd14ed0/lib/commands/dist-tag.js#L188-L190
It's worth looking into given the nature of this module.
Here's a gist showing how to do far fewer network requests to find the newest version of a given package is available from the npm registry. I limited the pagination to 10 just to show pagination working. Be careful making that query too large because there is a limit on url length, though I don't know offhand what that limit is
https://gist.github.com/wraithgar/d5c974ee8e4a7e08ded89742c15e96ce
This one does semver comparison to NOT do breaking changes. You could just not pass in the existing manifest semver to just get the max. This does NOT take into consideration the latest
tag of a package. It's purely semver comparison.
ETA: the dependencies endpoint is NOT tag aware, so you would have to pass in *
instead of ${encodeURI(manifest.dependencies[d])}
on line 13 in order to have the registry return all versions, not just ones bounded by the current semver of the package.json
I updated the gist to give examples of "highest version possible without changing the package.json" which is effectively what "npm update --save" will do (dunno if that shipped yet), and "highest semver version possible based on everything in the registry"
npm-check-updates
node >= 10.17
Steps to Reproduce
Run
ncu
with >100 dependencies and a slow internet connection.Current Behavior
It's slow.
Expected Behavior
It's fast.
Suggested solution
Ask
npm
to expose a route where one can query for more than one package at a time. It could even explicitly be an updates available route that only replies with a list of updates given current version, not with a list of last available versions. They will have to put limits on this route like maximum payload (ncu
resolves wildcards locally), but on average I think it would lighten the load on their side and make it faster for us.