raineorshine / npm-check-updates

Find newer versions of package dependencies than what your package.json allows
Other
9.43k stars 328 forks source link

Very high bandwidth usage in monorepo #1351

Open ElPrudi opened 11 months ago

ElPrudi commented 11 months ago

Steps to Reproduce

.ncurc:

{
    "format": "group",
    "root": ".",
    "upgrade": true,
    "workspaces": true
}

Steps:

I've have a monorepo that has a main react package and some utility packages with the sames dependencies: @types/node, eslint, vitest and typescript.

I used AppNetworkCounter to monitor the bandwidth usage.

Current Behavior

My guess is, that this script re-checks every dependency even if it was already checked before. That is, if I have 4 packages in my monorepo each with @types/node, it checks it four times instead of caching it. As I wanted to update my monorepo, the bandwidth shot in the air with like 3MB/s and 1 or 2MB of requests and file reading for each dependency.

Expected Behavior

raineorshine commented 11 months ago

Hi, thanks for reporting.

In a single run of npm-check-updates results are memoized by dependency name. So in a monorepo you should see only one request per dependency, even if it is used in many packages.

Note that you can enable caching across runs with the --cache option. The default behavior is to fetch new dependencies every time. This prioritizes accuracy over performance.

Cache versions to a local cache file. Default --cacheFile is ~/.ncu-cache.json and default --cacheExpiration is 10 minutes.

ElPrudi commented 11 months ago

Ah, thank you very much. My bad for not reading the docs. The thing is: I have packages with the exact same dependencies and it was still checking and requesting for each one.

raineorshine commented 11 months ago

Sorry I misunderstood initially (edited). If you are seeing multiple requests for the same dependency, then we should investigate as that is not the expected behavior. It should memoize the results of every dependency across monorepo packages.

If you have a simple reproduce case that would be helpful. If not, I'll do a test run in workspaces mode and print some logs to see if I can reproduce it.

ElPrudi commented 11 months ago

I can tell you the stats of mine after fetching 90 dependencies:

Received Bytes: 80.039.784 Sent Bytes: 35.658 Receive Speed: Roughly 2.7 MB/s Received Packets: 51.921 Sent Packets: 103

The sent packets does roughly match the amount of dependencies I have, so there must be a problem with the amount of data these requests hold. I guess, a simple vite create command and creating a simple mono repo with 4 packages with these dependencies should be enough to trigger this:

{
    "devDependencies": {
        "@types/node": "^20.9.2",
        "@vitest/coverage-v8": "^0.34.6",
        "@vitest/ui": "^0.34.6",
        "eslint": "^8.54.0",
        "tslib": "^2.6.2",
        "typescript": "^5.2.2",
        "vitest": "^0.34.6"
    }
}

I can't currently create a simple repro for that, sorry. I try to make one as fast as possible later on.

raineorshine commented 11 months ago

Great, thanks. Can you also send the stats for ncu typescript? That will tell us how many bytes are in a single request.

ElPrudi commented 11 months ago

Yup, here are the stats:

Received Bytes: 7.069.026 Send Bytes: 817 Receive Speed: Roughly 2.7 MB/s Received Packets: 2.103 Send Packets: 3

It's basically downloading the entire package from the release tab.

ElPrudi commented 11 months ago

Also, yes, cache does help. Previously I had to cancel the ncu process, but now I tested it on a monorepo with way more packages:

Received Bytes: 160.470.651 Send Bytes: 69.258 Receive Speed: Roughly 6.5 MB/s Received Packets: 43.501 Sent Packets: 201

With cache on, it's much less:

Received Bytes: 33.774.245 Sent Bytes: 36.418 Receive Speed: Roughly 3MB/s Recevied Packets: 11.606 Sent Packets: 107

raineorshine commented 11 months ago

It's basically downloading the entire package from the release tab.

I'm not sure what you mean by "release tab", but npm-check-updates does download the entire package manifest for each dependency. It seems possible to just fetch the dist-tags, which would be way less bandwidth, though we would lose the deprecation check (deprecated versions are currently excluded from ncu results by default).

With cache on, it's much less

Great, thanks. So in that case it could be a bug in the memoization. I will look into it.

ElPrudi commented 11 months ago

I was refering to this tab.

Wouldn't it be possible to just read the first lines of a README or package.json or other manifest files? Is it possible to just request the first few chunks of the whole manifest file until a package version is found? Because 7.4 MB for just Typescript is way too much if you can just read it from smaller files of the repo.

raineorshine commented 11 months ago

Wouldn't it be possible to just read the first lines of a README or package.json or other manifest files?

package.json is the only manifest file. The README does not typically contain the latest version (if it appears on a README, it's usually in the form of an SVG badge that is cached from the npm registry).

Is it possible to just request the first few chunks of the whole manifest file until a package version is found?

That's a great idea to stream it. Once #1329 gets merged, we should be able to use fetch.json.stream.

ElPrudi commented 11 months ago

Still its weird that the received packets are 7MB in total when the entire package.json file of typescript is just 3.44 KB. There must be a lot of unnecessary data fetched as well.

raineorshine commented 11 months ago

Yes, good point. npm-check-updates currently uses pacote.packument which gets all manifests of all versions. That seems... bad 😅

ElPrudi commented 11 months ago

Ah, so it essentially boils down to a faulty dependency. Well, I hope the new dependency reduces the bandwidth usage a lot. At least this issue gave us a little insight about this problem too.

If you finally got this to work, try testing the bandwidth usage with AppNetworkCounter. It's free and doesn't even need an installation as it runs out of the box.

I leave this issue open until the bandwidth is reduced in ncu 17.

raineorshine commented 11 months ago

Basically yes. The use of other --target values requires getting all published versions, but the default --target latest should definitely not need them.

So there are three areas where the performance can be improved:

  1. Fix the bug that is causing duplicate packages to not be memoized correctly in workspaces
  2. Use pacote.manifest instead of pacote.packument.
  3. Use streaming to get the latest version in the minimum amount of bytes.
raineorshine commented 11 months ago

Let's start with the easy one. I've fixed #1 and published to v16.14.7. It will properly memoize the pacote call and only fetch each packument once now in workspace mode.

ElPrudi commented 11 months ago

Ok, this small change made this tool MUCH faster and reduced the bandwidth usage by a lot. Basically cache: true and cache: false are behaving the same now in default settings.

So the only issue is still the fetching of each dependency. ncu typescript still has roughly the same stats like before.

raineorshine commented 11 months ago

Great, I'm glad that's fixed.

I replaced pacote.packument with pacote.manifest when fetching a tag (e.g. --target latest, the default). Published to v16.14.8. Try it out and let me know if you see any reduction in bandwidth.

ElPrudi commented 11 months ago

Only the performance changed. TypeScript still costs 7MB and my repository still costs 33MB. The only thing that got reduced was the received packets with ncu typescript from 2K to 678.

raineorshine commented 11 months ago

Hmmm. I would have expected that to change. I'll try it on a couple more test cases.

raineorshine commented 11 months ago

There was a small error in my code that was preventing the manifest case.

Please try v16.14.9 when you get a chance.

ElPrudi commented 11 months ago

Now, this made it worse:

ncu typescript:

Received Bytes: 14.133.320 Sent Bytes: 1.165 Received Packets: 4.324 Sent Packets: 4

My repo:

Received Bytes: 67.151.019 Sent Bytes: 68.388 Received Packets: 29.317 Sent Packets: 196

You essentially doubled the downloading and also the requests got more. I guess instead of replacing, you just added it on top.

raineorshine commented 11 months ago

Turns out it was a mistake with the arguments passed to pacote.manifest. It caused an empty response, which triggered a fallback fetch that passed the functional tests.

Fixed in v16.14.10. (i.e. back to v16.14.7)

As for the download size, it turns out that pacote fetches the full packument even when calling pacote.manifest.

https://github.com/npm/pacote/blob/9e9e18761dc2091f0030a9a4758a47ee3d8b11a4/lib/registry.js#L118

Disappointing, but it fits my understanding that the npm registry api does not have an additional endpoint for just the manifest. The real solution will be an abortable stream.

ElPrudi commented 11 months ago

Damn, but thanks for trying. Well, you managed to help me a lot with my monorepo, so I'm gonna wait for ncu 17 when you implemented the streams approach.

raineorshine commented 11 months ago

Thanks for taking the time to report the issue. I learned some things!