Document the security design of yarn, the use of SHA1, and security roadmap

grempe commented 8 years ago

Do you want to request a feature or report a bug? feature (docs)

What is the current behavior?

Yarn website and blog posts make extensive claims about the security of yarn, going so far as to call it 'Mega Secure' on the homepage of yarn.

https://yarnpkg.com

Mega Secure.
Yarn uses checksums to verify the integrity of every installed package before its code is executed.

The Facebook blog post announcing yarn similarly makes security claims, but provides zero information on the security approach.

https://code.facebook.com/posts/1840075619545360

However, nowhere can I find documented what the security design of yarn is intended to accomplish other than these very generic statements.

Yarn appears to use the SHA1 cryptographic one-way hash function to generate and compare digests of packages. This is not really a checksum.

https://en.wikipedia.org/wiki/Checksum

Please prominently document:

what is the security roadmap for yarn?
how does yarn use cryptographic hashes to ensure that packages are unmodified?
what security threats does this prevent? and which does it not?
when are hashes generated and when are they compared?
is there a plan to publish hashes of packages in (signed) server metadata to allow the client to compare against a known source of truth?
why was the SHA1 hash chosen, when more modern and secure primitives such as SHA256 are recommended by security experts? (for example, SHA1 hashes are no longer allowed in TLS certificates due to security concerns). https://en.wikipedia.org/wiki/SHA-1#Cryptanalysis_and_validation
does the existing yarn.lock file support the use of other cryptographic hash functions other than SHA1 in the future?
what plans does the yarn project have for supporting cryptographic signatures for packages in the future?

It seems that now, in the early stages of yarn, is the time to incorporate strong security and signed packages. The claims of 'mega security' without any documentation to support those claims may be considered misleading by those concerned with security.

If the current behavior is a bug, please provide the steps to reproduce.

n/a

What is the expected behavior?

n/a

Please mention your node.js, yarn and operating system version.

n/a

markstos commented 8 years ago

Related to security, it would be relevant to highlight that by default a new reverse proxy has been added to network downloads-- registry.yarnpkg.org. This becomes an additional point of possible compromise for network downloads in addition to registry.npmjs.org. Some may wish to trust yarn as a npm client, but would prefer not to trust an additional layer of network infrastructure that the Yarn project is providing. All the offline support that Yarn provides are a great mitigation of all network-related security issues, but users should be aware of the additional network infrastructure they are trusting when using Yarn.

Daniel15 commented 8 years ago

why was the SHA1 hash chosen, when more modern and secure primitives such as SHA256 are recommended by security experts?

Unfortunately not under our control for npm packages at the moment... npm use SHA1, so we need to use SHA1 for npm packages 😢 . For example, see shasum in the JSON returned from https://registry.npmjs.org/babel-standalone

what plans does the yarn project have for supporting cryptographic signatures for packages in the future?

This is something I'd love to see. Debian packages have the right idea: It uses SHA256 for hashes, and the manifest (including the SHA256 hashes) is GPG signed.

grempe commented 8 years ago

Thanks @Daniel15 for the sha1 info. Looks like the npm project is choosing to do nothing about moving to SHA256 at the moment, despite all contributors to this thread agreeing its a good idea. They still closed the issue this summer after two years of discussion saying:

I can’t make any guarantees that this is something that we’ll get to in the next year or so.

https://github.com/npm/npm/issues/4938

I'm not trying to push a specific 'fix'. I think its hard though to hold a conversation about whether or not yarn is 'mega secure', or move forward towards a more secure future, without docs though. It can't be expected that security marketing is just taken on faith and anyone who wants to know more must read the source to decide if its true or not.

I think it is now incumbent on yarn to move this issue forward. It seems to me that npm doesn't see the need, or they think its too hard. I'm not saying other package managers have this all figured out, they don't. Rubygems has had signed packages for years (and intense discussion of how to make them more workable). You know how many use them though? I would hazard a guess that its less than %1 of gems. Why? Too hard to use for both packagers and users. And no good story for the distribution of public keys. @bascule has had lots of discussions in the Ruby community on the topic.

I look forward to learning more.

Daniel15 commented 8 years ago

We could always have our own server that does the hashing, instead of relying on npm to do it. Basically proxy npm's API but add some extra fields for hashes using stronger formats.

You know how many use them though? I would hazard a guess that its less than %1 of gems. Why? Too hard to use for both packagers and users.

Unfortunately I feel like it only really works well if people are forced to do it, otherwise they'll just do whatever's easiest. Very very few people will spend time properly signing their packages if they can get away with not signing them. On Debian and Ubuntu, pretty much all packages are signed, as you see a warning every time you apt-get install or apt-get update or apt-get upgrade if any of the package sources you're using aren't signed (or the user doesn't have the public key installed). This means that if you want to host some Debian packages, you'd better ensure you sign the package source 😄

no good story for the distribution of public keys

dpkg (apt-get) has a command to load a public key from a keyserver, for example this is what we do for the Yarn packages:

apt-key adv --keyserver pgp.mit.edu --recv D101F7899D41F3C3

I don't see why other package managers couldn't just do the same thing

grempe commented 8 years ago

Well, not to get too far off topic (documenting the current state and roadmap for security features of yarn), I think a GPG solution is probably a non-starter. Using a decentralized web-of-trust model would require every yarn user that wants to verify packages to import the gpg key of the package signer, or someone who vouches for the keys of package signers hierarchically. And GPG, and its multiple (not always compatible) versions, is painful for most mortals to use and difficult to support.

Tony (@bascule) wrote up some really good thoughts on the topic for the Ruby gems community to think about. Sadly, I don't think this ever gained any real traction.

https://tonyarcieri.com/lets-figure-out-a-way-to-start-signing-rubygems

The current solution for signing Ruby gems requires packagers to:

generate their own key pairs
publicize that a public key for verification is available on the web somewhere
instruct people how to retrieve and install a key (with trust of keys left up to each person to decide)
use a different gem installation command line flag to inform rubygems you want to verify a signature.
there is no good story for installing a signed gem that has dependencies on a mix of signed and unsigned gems.
few people in the ruby community know anything at all about this process, or even of its existance.

http://guides.rubygems.org/security/

You don't want to copy that model.

In any case, I think a good place to start is for the yarn team to justify current security claims in marketing docs, document where they are now vis-a-vis security, and where they would like to go. If no one is thinking about these issues then I think its an opportunity wasted if yarn gains significant mind-share and momentum makes it too hard to change later.

markstos commented 8 years ago

As long as we are wondering some on the topic of security:

As long as yarn is running a reverse proxy, it could add value by periodically checking the SHAs of cached packages against the SHAs of the exact same package versions upstream. If SHAs differ but the version hasn't changed, that could indicate a recent compromise of the package on one end or the other.
In related news Mozilla is highlighting retiring SHA-1 today. As part of being "mega secure" it would be great if yarn upgraded from SHA1 as another advantage over npm.

bernhardreiter commented 7 years ago

My suggestion is to split this issue in two or three different ones:

document the security approach (and getting the process started)
move off SHA1 to something better (recommended is SHA3 keccak for new applications like this one)
adding packet signing to the used repository. (Note that OpenPGP as implemented in GnuPG is a good idea and can be used with different trust models).

Daniel15 commented 7 years ago

move off SHA1 to something better

We're limited by npm... Their servers provide the hashes, and thus we can't fix the algorithm until npm moves to something better.

bernhardreiter commented 7 years ago

The hash has to be calculated on client side to be saved in yarn.lock and those files are shared to become fully reproducable, they could be extended to additionally include Keccak (=SHA3) hashes or use it alone. There are probably other potential solutions. My suggestion is to not discuss the possibilities here, but in a separate issue.

Daniel15 commented 7 years ago

Sent from my phone.

On Feb 22, 2017 11:27 PM, "Bernhard E. Reiter" notifications@github.com wrote:

The hash has to be calculated on client side to be saved in yarn.lock and those files are shared to become fully reproducable, they could be extended to additionally include Keccak (=SHA3) hashes or use it alone. There are probably other potential solutions. My suggestion is to not discuss the possibilities here, but in a separate issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yarnpkg/yarn/issues/1169#issuecomment-281916012, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFnHXS3bRAFZWtvq4ONI0wlapxpQ3Qkks5rfTTOgaJpZM4KZCeG .

rally25rs commented 7 years ago

I'm giving a presentation on Yarn at a conference and was looking for actual details to back up this "mega secure" claim. From my own testing, I actually noticed that NPM will check the hash of the downloaded .tgz file before extracting it.

npm ERR! shasum check failed for /var/folders/yf/gcmnw1y96k31lh9ttjhfm8v9ssxkkq/T/npm-7989-60191d85/registry.npmjs.org/getpass/-/getpass-0.1.7.tgz
npm ERR! Expected: 5eff8e3e684d569ae4cb2b1282604e8ba62149fa
npm ERR! Actual:   4d3a2db66b2aee079b690573f2246a6f906a5be4
npm ERR! From:     https://registry.npmjs.org/getpass/-/getpass-0.1.7.tgz

Yarn on the other hand, does not, and passes it off to the extraction library.

yarn install v0.21.3
[1/4] 🔍  Resolving packages...
[2/4] 🚚  Fetching packages...
error An unexpected error occurred: "https://registry.yarnpkg.com/getpass/-/getpass-0.1.7.tgz: invalid distance too far back".

If the underlying extraction library had a vulnerability then it could be exploited under Yarn by a man-in-the-middle returning a malicious .tgz instead of the real one, making Yarn less secure than NPM. What is Yarn actually checking? The hash on each file post-extraction?

The "... before its code is executed" phrase is also confusing. To many people that probably means "oh when I require('foo') then Yarn will check it before it loads."

gsklee commented 7 years ago

@kittens @samccone Is it possible for us to have a blog post dedicated to this topic sometime?

bennycode commented 6 years ago

Reading this recent incident on disappearing npm packages, it might be a good moment to push this discussion here. I would also like to know how Yarn uses checksums to verify the integrity of every installed package before its code is executed.

hach-que commented 6 years ago

Of note, out of that recent incident on disappearing npm packages (especially the fact that people could upload over the old package names), I wrote a tool called pkgsign, which can be used to add signatures to packages and verify them:

It would definitely be a step up to have signature verification called from inside a package manager itself (especially since at the moment users have no way of wiring up pkgsign to verify packages before they are unpacked into a dependency tree).

Daniel15 commented 6 years ago

Hey @hach-que, good to see you here 😄 That project sounds pretty interesting. I wonder if we could integrate it into Yarn itself, or at least provide hooks for it to easily hook in to Yarn. Perhaps it's worth writing an RFC for Yarn (https://github.com/yarnpkg/rfcs) and seeing what the core team thinks. Someone on the core Yarn team (@bestander or @arcanis, I think) was thinking about how to introduce package signing into Yarn a while back.

bestander commented 6 years ago

I support that motion

hach-que commented 6 years ago

Good to see you too!

Yeah I did take a look at the Yarn contributing process before writing pkgsign, and decided that the whole RFC approval process would take too long to get something functional in place (by the time it went through approval and design to implementation).

I've got quite a few ideas for improving pkgsign over the next few months, so I think the best way forward would be to have hooks rather than a direct integration; that would allow pkgsign to continue to evolve rapidly without the whole Yarn RFC process.

There's two kinds of places where hooks would be useful:

After a tarball has been downloaded but before it's been extracted / installed.
After the package tree has been extracted, but before any lifecycle hooks are run for packages themselves. That is, the files exist on disk but Yarn hasn't executed or trusted any of the files of an installed package. Question is what to do about package.json when Yarn goes to find dependencies of dependencies to install? My guess is that Yarn shouldn't trust package.json unless the tarball has been verified, but this would introduce blocking steps in package resolution.

tarball verification with pkgsign just tells you whether that individual tarball is compromised/unsigned/untrusted/trusted. package tree verification can check all the packages being used.

The issue with tarball only verification is that it doesn't really help with the "sign on behalf of a dependencies' contents", since while you can sign the content of a dependency, you can't really verify the contents of that dependency are valid for the signature until the package tree has been extracted and the dependency's files are available.

I also want to include the expected identities for dependencies when signing a package. So let's say React depends on SomeDependency, which is signed by Person A, and React signs their package, the signature.json for React would include "SomeDependency should be signed by Person A". That would prevent a scenario where someone (Person B) takes control of SomeDependency and someone installs React seeing SomeDependency for the first time, and trusting Person B. With expected identities, pkgsign can say "well when React was signed, Person A was in control of SomeDependency, so it's flagged as compromised (instead of untrusted) for Person B to be in control of it even if this device has never seen SomeDependency before". Again expected identities is impractical to verify if just verifying tarballs.

That said, I think the two hooks above give me somewhere to go on the pkgsign side of things with regard to making it work globally with a package manager for all packages, rather than pkgsign being something that people run after packages have been installed (and consequently, after the package manager has already run preinstall/postinstall lifecycle scripts).

arcanis commented 6 years ago

After the package tree has been extracted, but before any lifecycle hooks are run for packages themselves. That is, the files exist on disk but Yarn hasn't executed or trusted any of the files of an installed package. Question is what to do about package.json when Yarn goes to find dependencies of dependencies to install? My guess is that Yarn shouldn't trust package.json unless the tarball has been verified, but this would introduce blocking steps in package resolution.

I'm not sure I follow - shouldn't the package hashes be validated on the tarball itself? Or are you validating each file individually?

hach-que commented 6 years ago

In each package (both in the tarball and on disk) there is signature.json. It contains a list of files expected in the package and their SHA512 hashes. From that list, we build a deterministic text string and sign it with the private PGP key, and store the signature inside signature.json.

When validating a tarball, we check that every file in the tarball is listed in signature.json (i.e. no extra or missing files), that the SHA512 hashes all match, and that the signature is valid for the list of files + hashes.

When validating a package directory, we use npm-packlist to get a list of files from the directory (so that build artifacts from post-install scripts, etc. are correctly ignored), but there still must be no extra or missing files (i.e. if npm-packlist returns an extra, non-ignored file and that isn't in signature.json then verification fails).

We don't sign the tarball itself, we sign the expected file list and hashes instead, since whatever signing system we built had to work with existing registry code (which has no support for signature metadata).

hach-que commented 6 years ago

You can see how the verification process works here: https://github.com/RedpointGames/pkgsign/blob/master/src/lib/moduleVerifier.ts#L37

BYK commented 6 years ago

@hach-que there's this already existing tool for signed tar files, without relying a file inside the archive as far as I understand: https://www.gnu.org/software/swbis/sourcesign-1.2/gendocs/manual/sourcesign.html

Have you thought about using this? What are the shortcomings of that approach?

hach-que commented 6 years ago

@BYK I'm not sure how portable that is - some quick googling doesn't show much support for reading PGP signatures of tar files in e.g. tar-stream. That also makes me question how hard it would be to get that running on Windows reliably.

In any case, signing using some tarball metadata doesn't help support features like "signing on behalf of" or expected identities for dependencies, nor does it allow for verification of modules that are already unpacked into node_modules.

BYK commented 6 years ago

@BYK I'm not sure how portable that is - some quick googling doesn't show much support for reading PGP signatures of tar files in e.g. tar-stream.

Well we can probably add this functionality easily into the tar module we use, right?

That also makes me question how hard it would be to get that running on Windows reliably.

Why do you think so? As long as it is implemented via JS and Node, it should be mostly portable logic.

In any case, signing using some tarball metadata doesn't help support features like "signing on behalf of" or expected identities for dependencies

Can you elaborate more on this? From my understanding, the metadata field is not very limited so we may be able to store a serialized JSON string there for any extra information. Am I missing something or just hoping too much? 😁

, nor does it allow for verification of modules that are already unpacked into node_modules.

Why is this use case important? If we are going to start opening and reading all the files in node_modules, we may just replace the whole thing with a known "safe" extraction as well I guess. This would not help you with a corrupted file system but then I'd argue you already have more important problems than validating your node_modules folder?

hach-que commented 6 years ago

Well we can probably add this functionality easily into the tar module we use, right?

What's the benefit to making it swbis-compatible though? Like, yes someone could write all this extra functionality into tar-stream, but why do this when there's a solution that is already working and doesn't require writing all this extra code?

The only technical benefit I can see is that people could verify the archives with a swbis binary, if they also have previously manually downloaded the PGP keys for the identity of the package. They wouldn't get the benefit of e.g. pkgsign automatically managing PGP public keys from keybase.io or over HTTPS based on the identity in the package.

It just seems like a very edge use-case to make it swbis compatible for a lot more technical complexity.

Can you elaborate more on this? From my understanding, the metadata field is not very limited so we may be able to store a serialized JSON string there for any extra information. Am I missing something or just hoping too much? 😁

Inevititably there's still going to be packages that people depend on that aren't signed. Let's say the React project is signing their package, and they depend on the (dev) dependency "async". For whatever reason, maybe the author just hasn't got around to it yet, the async package isn't yet signed.

However, the React developers want to offer consumers of React a fully trusted tree - that is, if someone trusts the React developer's identity to provide the React package, all the dependencies of React at the exact versions they depend on should also be considered trusted.

In this case, when the React package is signed, we can include hashes of all of the files in all of the dependencies that React depends on. So if you pull in React as a dependency you can know that not only is React intact, but all the dependencies of React exactly match the code that the React developers trust. This allows consumers of signed packages to ensure that dependencies of those packages don't get tampered with or compromised, even if the authors of those dependencies aren't yet using package signing.

The trade-off here compared with the authors of those dependencies doing package signing is that the React project would have to use exact versions in their project.json, since if any of the files in the dependencies differ from when the React package was signed then the signature of those dependencies would no longer be valid.

In my opinion, this is an important scenario in order to enable fully signed, trusted dependency trees faster than they would otherwise occur if every author of every package had to use package signing.

I also want to support "expected identities", where when the React team signs their package, it includes information about which identity they expect to be signing their dependencies. This ensures that if someone is installing React for the first time, a compromised dependency signed by a different identity won't work because the identity won't match what the React package expects (remembering that someone installing a package or dependency for the first time haven't trusted any identity with that package name yet).

nor does it allow for verification of modules that are already unpacked into node_modules.

Some use-cases:

Dependency signing "on behalf of" and "expected identities" (see above). This is much harder if you don't have a full tree available, and is impractical if just verifying a tarball in isolation.
Verifying dependencies pulled from Git repositories. You can verify the pkgsign repository today with this technique.
Being able to view a signature summary of all the packages you depend on. pkgsign verify . --full today will give you this information.

les2 commented 6 years ago

i wonder how rpm / deb handle signature verification for packages and dependencies ...

wilk commented 6 years ago

Hi guys,

I join the discussion with my 2 cents.

I wrote an article called Secure NPM (https://hackernoon.com/secure-npm-ca2fe0e9aeff), showing a way to improve the security for NPM packages. There's also a working Proof of Concept here: https://github.com/wilk/snpm

I also suggested the idea on the NPM community forum: https://npm.community/t/feature-request-consistency-between-what-is-published-on-npm-and-the-source-code-published-on-public-repos/509/11 and opened a RFC: https://github.com/npm/rfcs

Hope this can help 💪

wilk commented 6 years ago

It seems Cargo does something like SNPM: https://doc.rust-lang.org/cargo/reference/publishing.html#github-permissions

yarnpkg / yarn

Document the security design of yarn, the use of SHA1, and security roadmap #1169