Suggestion: Provide standards around integrity between source code and published package

shaunwarman commented 5 years ago

Right now, it seems that most maintainers may publish their packages from their local environment. There should be a way to verify what is published against the public source code or specific git sha to maintain transparency of what is being published. Not only will this mitigate out of sync issues or accidents, but will provide greater confidence that additions aren't added as they are published (potentially malicious).

Not sure if this is the best place for this, but after reading through other issues and recent resources I thought I better put this down somewhere. And it brings up the discussion of maintainers permissions to not only package registry, but SCM as well.

ljharb commented 5 years ago

npm doesn’t require a repo, and any package with a prepublish build process (read: every Babel user) will correctly have the published package not matching any sha in their git repo.

I’m not sure how we’d be able to do any sort of verification in a consistent and automated way.

mcollina commented 5 years ago

The only way to guarantee this is to compute an hash of all the content of a module (in prepublish) and cryptographically sign it. However, there is no easy way to know:

a. what are the keys that are allowed to sign a given package b. associate those keys with npm profiles

People have been asking for a similar feature to npm for the last 2-3 years.

shaunwarman commented 5 years ago

Thanks @ljharb @mcollina and sorry for the late reply.

This article by @skonves gave me a nice reminder of this topic. And it seems like there are tools in his tbv and another in npm-verified that attempt to do similar.

There would need to be a blessed, yet optional process to add some sort of "trusted": true || false or "verified": true || false metadata that npm or other package managers would need to set. Optional because of the common cases of transpiled code (e.g. babel, etc) and because this is an expensive task with trade offs.

sompylasar commented 5 years ago

My 1 cent out of 2 (don't have time for more): npm doesn't require a repo, but a package verification service on top of npm might require whatever is needed, and package authors that want to conform and get the "verified" badge will try and make sure their package is good. For example there are similar efforts to standardize open source repositories: https://github.com/todogroup/repolinter — there could be tools that help package authors to manage packages efficiently (I personally miss the kind of tool that would automatically set up CI and releases for me reproducibly and reliably).

ljharb commented 5 years ago

Indeed, that’s something npm can solve - but i don’t think it’s something node, and thus we, can.

niftylettuce commented 5 years ago

Chiming in here, we really need something like this... koa-router was just transferred to a relatively unknown user on GitHub and the package name was apparently sold

https://github.com/ZijianHe/koa-router/commit/bd780c97a831199225bbc67122e83ebccf6ed1c4#diff-04c6e90faac2675aa89e2176d2eec7d8R3

Screenshot (in case commit is force deleted):

screen shot 2019-02-13 at 2 38 39 pm

I've version locked koa-router, which is downloaded 135K+ times per week, and subscribed on https://libraries.io/npm/koa-router to get a notification when a new version is published of this package.

HN: https://news.ycombinator.com/item?id=19156707

ljharb commented 5 years ago

@niftylettuce i'd suggest reporting that to npm; i doubt their TOS permits the sale of a package name.

niftylettuce commented 5 years ago

I did report that to NPM, and here was there response...

screen shot 2019-02-13 at 2 47 19 pm

I also reached out to the author of the package koa-router and received a very negative response which I don't wish to share publicly out of respect since I see no malicious version of the package published yet.

justinmchase commented 5 years ago

@niftylettuce You did the right thing here even if nothing negative comes of it. Its a red flag for sure and others deserve to be alerted by this information. Good job on being alert.

niftylettuce commented 5 years ago

To anyone reading this, I'm building a tool to automate this nonsense, at least until Node/NPM do something about it. Email me at niftylettuce@gmail.com if you want to get notified once it's up. I'll notify everyone that posted in this thread and/or left reactions as well. It will be free and open-source.

Enrico204 commented 5 years ago

Maybe it's time to think to an NPM alternative.

iarna commented 5 years ago

What's more, even if npm validated against the git repo at publish time, the user can just force push after publication. Checking this kind of thing does nothing.

If you want to validate at install time, well, I hope you enjoy your multi-hour install times. =p (Seriously: In modern npm or yarn, install time per package is under 10ms—adding a git clone to that mix would massively increase overall run time.)

This kind of action gets you no security whatsoever. It solves no actual problem. It validates nothing.

freewil commented 5 years ago

@iarna what do you think about a new command for npm, I've been thinking about a npm diff that you would be able to use while upgrading a package to see actual changes. Haven't thought about it super deep, but npm update or npm install mypkg@latest followed by a npm diff or similar.

This would effectively allow you to review the code of new/updated packages, similar to how adding node_modules to version control would allow for deeper code reviews.

jaredhirsch commented 5 years ago

@niftylettuce Hey, if you're going to work on a solution, it would be way better to start a repo on github, so people can contribute ideas/feedback via issues

iarna commented 5 years ago

@freewil That seems much more useful, but I am concerned how you scale that out to the thousands of deps in a typical modern deployment. 'cause yeah, the one thing you asked for may have a reasonable diff, but what about the dozen transitive deps that also updated? Still, I think this would be an excellent place to begin experimenting with, to see how it feels (--dry-run --json can get you what an action would have done, in machine readable format).

niftylettuce commented 5 years ago

@6a68 yes it will be on GitHub, I will post the link here once I have a proof of concept up

niftylettuce commented 5 years ago

To anyone reading this - please do not harass, email, or contact the original maintainer of the package mentioned in the above discussion. It was never my intention for anyone to harass them. I simply wanted to raise awareness about this issue with NPM and the potential of this becoming a security issue in general. This is not the only package like this.

niftylettuce commented 5 years ago

Another update on the koa-router issue for anyone subscribed to this thread. The new maintainer @ZijianHe has provided us with an update https://github.com/ZijianHe/koa-router/issues/494#issuecomment-463468328. Hopefully this eases concern and it looks like we have a new contributor in open source land.

For the CLI npm diff command, would it just accept two different versions of a package? npm diff <package> <version-a> <version-b> and diff compare tarball?

Enrico204 commented 5 years ago

Maybe the best "way" to handle the change of a maintainer is that the new maintainer should open its own repository on NPM. They should not allow a "takeover" procedure: instead, the installation may fails with a message like "hi, this library is not maintained anymore by olddev, there is a new version in newdev". At least the developer knows that there is something going on..

mcollina commented 5 years ago

I think the current way npm does this is by forcing a bump in a major release. That's enough to protect users form a malicious "takeover" in the case of a non-responsive maintainer.

panva commented 5 years ago

I think the current way npm does this is by forcing a bump in a major release. That's enough to protect users form a malicious "takeover" in the case of a non-responsive maintainer.

Is that so? Any way to confirm?

mcollina commented 5 years ago

I've done this several time, and that's how it works. I looked in https://www.npmjs.com/policies/disputes but there is no mention about that.

ljharb commented 5 years ago

They certainly don’t; and can’t, because a legitimate new maintainer should be able to backport fixes as needed anyways.

Any owner can always and forever publish to any previously unused version number, and that’s how it must stay.

sompylasar commented 5 years ago

@iarna

What's more, even if npm validated against the git repo at publish time, the user can just force push after publication. Checking this kind of thing does nothing.

I assume you assume the evil user force pushes to remove malicious code?

If such force push leads to file changes, git commit ids will change. Npm would record the commit id it verified against, and a standalone checksum of published files. If there's no such commit in the repo, it's a red flag. If running the same checksum at the same commit id files results in a different checksum, it's a red flag.

I believe this verification has to be a background task in npm; if one of the red flags has triggered, this package downloads are paused to prevent further spread until they are resolved. There has to be cache invalidation mechanism that would notify downstream caches to not use the flagged package.

I'm in no way a security expert though, just brainstorming. What do the npm security people that npm aquihired say?

sompylasar commented 5 years ago

The way I experimented in npm-verified POC is I needed a standardized package build+publish process (I used npm prepare for that). The packages that conform to the process can be marked as verified if the checks pass. The packages that do not — never. Like in Chrome, the sites that do not use HTTPS are marked as "Not Secure" by default, and even more, colored in red if they collect some user input via a form (for npm, this can be translated to using some OS APIs like process, network, and filesystem).

freewil commented 5 years ago

@niftylettuce

For the CLI npm diff command, would it just accept two different versions of a package? npm diff <package> <version-a> <version-b> and diff compare tarball?

Yeah, I think that would be one version/use-case of the command. The way I'm currently thinking, it'd be nice to also have a no argument version similar to git diff where the current unstaged/uncommitted changes are shown, but that may be more complex than necessary for an initial version as it might require git/version control integration. I'd be hesitant in entangling npm with version control-specific code, but I think there is already a precedent with some commands - one that I'm aware of is npm version, which creates git tags for you.

freewil commented 5 years ago

@freewil

but that may be more complex than necessary for an initial version as it might require git/version control integration.

One alternative to prevent the need for tracking changes/state would be just to add a --diff flag to npm install that would output a diff. That should be much easier since npm install already mutates package.json (which means it needs to/could be aware of the version change while running) and prevents needing to track changes/state across two commands, as I originally proposed (npm install followed by npm diff).

iarna commented 5 years ago

@niftylettuce That'd be pretty keen! (We have an RFC for doing similar things with changelogs, and this feels like a comfortably related feature.)

@sompylasar

If such force push leads to file changes, git commit ids will change. Npm would record the commit id it verified against, and a standalone checksum of published files. If there's no such commit in the repo, it's a red flag. If running the same checksum at the same commit id files results in a different checksum, it's a red flag.

Again, when are you imagining these checks are being run? At publication time there's no point, and at install time that's still high overhead.

But that doesn't even get into my real issue with these proposals. Let's imagine that it's a practical change, that it can be done in a way that verifies, at install time, that the artifact matches some ref in the git repo, and that this can be done without slowing down installers so much that users refuse to use the feature.

This would try to encourage every publish to go to a publicly hosted git repo somewhere, but what do you imagine that would accomplish? That you can be certain that malicious code is on a git repo somewhere? Who does this help and how? What attack vector is closed by this that would otherwise be open? Users can inspect it sure, but they can inspect the tarball too, trivially, and tooling could be built around either of these equally easily, so how does having the source of inspection being some git repo help?

freewil commented 5 years ago

~~^ misattributed quote~~ fixed

iarna commented 5 years ago

@freewil 🙃 Sorry, yes, fixed!

dominykas commented 5 years ago

Verifying packages against the repo is an expensive and inconclusive task - I'm not sure building that natively into standard npm is even feasible, but it would be nice to have. Similar to npm-verified, there is another PoC: https://www.npmjs.com/package/tbv.

That said, I do see value in providing these checks as a signal, perhaps as a paid feature.

Alternatively, they could be provided as a SaaS to those willing to pay.

While all it proves is that the package on npm can be built from the contents on git, that is a reasonable signal that it hasn't been tampered with due to e.g. infected computer which is used to publish. It also serves as a signal of certain best practices (eg reproducible builds) kept by the maintainer, therefore increasing trust in the package.

While it only adds so much trust, it's a valuable signal, and when used with other signals (eg was this published with 2fa? Does it use any risky APIs? Does it communicate with the outside? etc static analysis) can help people draw the line or spot oddities.

Looking at this proposal and dismissing it with "well, this is expensive and you're still unsafe" is an unfair statement, because security is not a single dimension with One Fix To Rule Them All. It takes a lot of work and multiple components to build up trust in external code.

skonves commented 5 years ago

Alternatively, they could be provided as a SaaS to those willing to pay.

I actually already have TBV running in the cloud as verifynpm.com eg. https://api.verifynpm.com/packages/tbv or https://api.verifynpm.com/packages/tbv@0.3.6

I'm footing the bill for now (AWS free tier FTW!). All of the code for that service is open source: github.com/verifynpm.

Unfortunately, as the author of TBV, I didn't know about @sompylasar's work before I started. I was mostly trying to see what steps it would take to find added or modified files in a package that aren't the product of the source code or related build script. (Moral of the story: Google first, code later 😬)

Anyway, I think there are a few conclusions to make from the actual effort that some of us have been contributing in addition to what has been verbalized in this issue so far:

The community wants a solution
There is indeed a process that can verify if a package is a deterministic function of source control at a specific commit
That process is potentially expensive (time, processing, cash money, etc)
That process can produce false negatives for legitimate packages
That commit may be revoked from source control at any time
Removing that commit makes a previously verifiable package unverifiable
There are multiple attack vectors besides the SCM => NPM publishing process
The plurality of vectors doesn't invalidate the benifit of any single solution
This conversation needs to continue 😃

As the maintainer of the verifynpm API, I would love to get some code review on what I have built so far. I appreciate the sceptisism surrounding this topic, but the presence of multiple tools (that actually exist) indicates that it IS possible. I suggest that we steer the conversation toward poking holes in existing solutions (both Ivan Babak's and mine) so that we can start engineering solutions. Issues and PRs welcome :+1:

Cheers! Steve

skonves commented 5 years ago

RE: npm --diff, a year or so ago I put together "npmspy" to compare the code changes between package dependency trees. It was powered by loading the entirety of npm into a graph and which was fronted with an API. It's all kinda gross and broken. It was a bear to manage back when npm was only 500k packages. That's double now and I don't have the bandwidth (or funding) to work on the project anymore.

Feel free to read/use/steal/finish the code: https://github.com/skonves/npmspy https://github.com/skonves/npmspy-data

freewil commented 5 years ago

@skonves I checked out https://github.com/verifynpm/tbv but before I dig into the code, it'd be great to understand what the project actually does and what problems it's meant to solve. All I get from the readme is that it does package verification, without explaining what that actually means.

skonves commented 5 years ago

@iarna

This would try to encourage every publish to go to a publicly hosted git repo somewhere, but what do you imagine that would accomplish? That you can be certain that malicious code is on a git repo somewhere? Who does this help and how? What attack vector is closed by this that would otherwise be open? Users can inspect it sure, but they can inspect the tarball too, trivially, and tooling could be built around either of these equally easily, so how does having the source of inspection being some git repo help?

The first attack vector being closed is the possibility of adding malicious code to the package without pushing it to source control. The community can review the source all day long, but will never find it. This is one of the exploits in David Gilbertson's hypothetical attack.

The second attack vector being closed is the possibility of compromised npm credentials/token being enough to compromise a package. If the contents of the package must be reproducible from source control to publish, then an attacker would have to gain access to both the npm AND Github account to push malicious code. Such a process would have actually prevented the eslint-scope attack from last year. In that case, the attacker gained access to one set of tokens and then used them to publish a version of eslint-scope that stole the tokens for other packages. Github accounts were never compromised.

It is currently possible to inspect the tarball, but from my personal perspective, there are a few reasons why reviewing source control is a more optimal solution.

Reviewing a tarball requires that you review the entire corpus of code. Reviewing source control allows the reviewer to focus on a certain range of commits/changes.
Packages often contain transpiled and/or minified code. This code is generally more challenging to read and thus review.
Github et al provide tooling for code review which lends more context to a change than manually diffing the contents of tarballs.

skonves commented 5 years ago

@freewil

TL;DR "verification" is the process of verifying whether the contents of a package can be reliably reproduced from source control.

I have written a little bit about the problem so far: What if we could verify npm packages? NPM Package Verification — Ep. 2

I make some clarifications in this thread: https://twitter.com/chrisdlangton/status/1092927031392624640

I have at least one other article in the works that will (hopefully) better describe the problem that it solves and why the solution is needed.

ljharb commented 5 years ago

Can't you compare two tarballs, and get the same ability to look at diffs?

RHavar commented 5 years ago

Thanks @skonves the work you're doing in the space is great. While iarna is right in theory, it's still extremely valuable just knowing "X was built by commit Y". This puts a malicious package maintainer in an awkward spot deciding between putting the malicious code in source-control (where it's more likely to be seen) or published the "wrong" commit (where it's trivial for someone to notice).

By no means will it fix the security nightmare that is npm, it goes a long way to hardening it.

iarna commented 5 years ago

The first attack vector being closed is the possibility of adding malicious code to the package without pushing it to source control.

How is this an attack vector? Differences between the two are not a vulnerability. I've seen the article you linked, I just don't find it very convincing, or, frankly, interesting. It's juuust a step above saying "if you type sudo rm -rf / you'll destroy your system, so Unix is insecure".

The second attack vector being closed is the possibility of compromised npm credentials/token being enough to compromise a package.

2fa is vastly more effective mechanism for this. Making 2fa status visible of published modules (and allowing one to say "I will only accept 2fa verified publishes of package updatges") would more neatly solve this, without the entirely impractical performance hit of doing install-time git clones of every module you install. (Because if you want to verify that a particular committish exists on any public git repo, remember, means cloning the repo. No one is mandated to use github... there was that whole exodus to gitlab back when MS purchased them and believe it or not, people and projects run their own git servers as well. No solution that just leaves modules from those users in the cold is going to be acceptable. You don't actually even have to use git.)

RHavar commented 5 years ago

How is this an attack vector? Differences between the two are not a vulnerability.

It's not a vulnerability per se, but it's definitely an attack vector. It's yet another possible way to hide malicious code. This isn't even purely theoretical, it has been used in the past in a sophisticated attacks (see: event-stream) and it will be used in the future.

For almost all npm libraries, there is a lot more scrutiny that happens to the source repo than what is actually deployed to npm. A lot of libraries I have investigated are deploying non-reproducible minified code to npm, there's simply no way it's getting audited -- yet every commit is scrutinized closely.

If npm did the sane thing and showed how/where it was built, it would generally get noticed pretty fast if a release is derived from a weird repo state. If they don't care about getting noticed too fast, they could latch onto the other insane behavior of npm to run scripts at install time (with no warning / opt-in).

@ljharb If you're going to edit my grammar, please do so correctly. It would have made a lot more sense to replace "he" with "iarna" or "she" than "they" when I'm clearly referring to a single person. (And my apologies to iarna about using the wrong pronoun, I only saw the small profile picture and didn't realize I was speaking to a "Rebecca").

Another simple feature for npmjs.com that would help (although obviously not be a silver bullet) is adding a "source viewer" for people to browse the code. It's kind of ridiculous you have to go to a special url and unzip a tarball just to review what 'npm install pkg" is going to do

ljharb commented 5 years ago

@rhavar “they” is and has always been grammatically correct to refer to a single person; you’re welcome to correct my correction.

ljharb commented 5 years ago

Are you aware of https://unpkg.com where you can view any version of any npm package?

skonves commented 5 years ago

my bias is showing 😬

Please interpret "Github" to mean "your favorite online git repository." I have yet to actually try Gitlab, but it sounds nice.

RHavar commented 5 years ago

@ljharb Thanks for that. I wasn't aware of that site, it's very cool. Something like that just hosted by npmjs.com would help make it more accessible. I fixed your edit, but for future reference: it is generally a lot more productive to talk to the person in private (i.e. my email is in my profile) instead of showboating and adversarial edits of someone else's comment.

ljharb commented 5 years ago

@rhavar there was no showboating achieved or intended, but i apologize for not contacting you first.

sompylasar commented 5 years ago

RE: unpkg, making a source viewer into npmjs.com itself is impractical, but adding links pointing to unpkg.com from a package page might be easy and useful for manual reviews.

Anyway, at modern node_modules scale manually reviewing every dependent package change by every consumer of a package would be highly impractical. There have to be centralized and potentially automated way to make it hard and impractical to add malicious code.

dominykas commented 5 years ago

@iarna

entirely impractical performance hit of doing install-time git clones of every module you install.

This is a strawman.

npm (or whoever implements it) does not have to do it at install time (it can do it on scheduled time and publish the result, re-verify occasionally, log the last verification time, have a manual trigger for paid users, etc)

npm client, can then use that information at install time to either warn, or, based on configuration, cancel the build, if the package can no longer be considered "verified". There are multiple layers that people use for security, and there are multiple different levels of security requirements.

There is a large spectrum between "I can use this because it has a reproducible build from 3 months ago" and "I can only use this when my compliance team allows it" and "I can use this, because I don't have compliance requirements and my threat model does not need to care".

Making 2fa status visible of published modules (and allowing one to say "I will only accept 2fa verified publishes of package updates")

YES, PLEASE 😍 Been dreaming about this for ages :)

would more neatly solve this

Somewhat, yes. For one, you can't 2fa in CI (there's going to be a solution soon, btw; in userland...) But also I'd postulate that 2fa has less usage than GitHub+Travis (ie reproducible builds) in the community. I'd love to see some data on this, though! (hint hint, can npm's chief data officer help?)

Reproducible, verifiable builds, though, serve as an independent verification factor. Anything that helps - helps.

Mind you that while reproducible builds are by far not a security silver bullet, they also are an indicator of package quality, which helps when trying to pick one.

You don't actually even have to use git.

This is a strawman.

Just because you can't verify all the packages, you can still verify some (most?), and that subset of some may be enough for some people. If you can verify 80% of packages you use this way, then the remaining 20% need other mechanisms - which can also be built and enforced (or you can fork and do whatever else).

I really truly deeply love npm, but I find it frustrating that these ideas are getting thrown out without even bothering to attempt to do anything about it, while not offering anything better.

mcollina commented 5 years ago

I have several packages that are different in their tar.gz compared to what is actually released. If you need to ship a bundled version of an app, or a .min.js file, you might not want to commit those to your repository. I do not think verifying the matching between the content of an npm package and a repo improve security.

sompylasar commented 5 years ago

@mcollina The verification method discussed above would compare the package build+pack output from source at certain commit marked with a version tag with the published package contents at the same version. Not ~~with~~ the source itself.

mcollina commented 5 years ago

@mcollina The verification method discussed above would compare the package build+pack output from source at certain commit marked with a version tag with the published package contents at the same version. Not with the source itself.

That does not guarantee anything if the package is not built with a lockfile in place. The output will certainly differs.

nodejs / package-maintenance

Suggestion: Provide standards around integrity between source code and published package #77