rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.8k stars 2.42k forks source link

Uncommitted Cargo.lock files prevent reproducing .crate files from source #9242

Open Gaelan opened 3 years ago

Gaelan commented 3 years ago

(Apologies if I've filed this in the wrong place; happy to move it wherever it makes sense.)

There's been some interest lately in ensuring that code uploaded to crates.io is the same as the code in the repository on GitHub.

Aside on why this is worth doing Most people who are interested in a crate's code will look at GitHub (or similar) repository, not the code uploaded to crates.io (citation needed, but I know I do this and I assume most others do too). This means that the code that's actually running is looked at by comparatively fewer people, and authors of malicious crates can make their vulnerabilities less likely to be discovered. This has happened in practice with [the `event-stream` NPM package](https://cnorthwood.medium.com/todays-javascript-trash-fire-and-pile-on-f3efcf8ac8c7). By comparing the published crate to the GitHub source, this ensures that any malicious code must be visible when people go looking for it.

This doesn't need to be done by cargo, of course, but Cargo's current method of generating crate files makes it difficult for any tool to do this.

Ideally, .crate files would be bit-for-bit reproducible. If that were the case, this would be as simple as downloading the .crate file, cloning the source, running cargo package, and comparing hashes. #8864 made it most of the way there, but it fails in practice (with at least the crates I tested, the latest versions of hyper and rand), because the Cargo.lock files in the uploaded crate differ from the newly generated one. The crates follow the official guidance to omit the file (because they're libraries), so my Cargo generates a new one on the fly, including any new versions of dependencies since the crate was uploaded. Therefore, there's a mismatch.

I see a few solutions here:

ehuss commented 3 years ago

Omit the Cargo.lock from .crate files for crates that only contain libraries.

This is already how Cargo works. The lock is only included if the package has a binary or example. I suspect the projects you looked at might have had some examples?

Gaelan commented 3 years ago

Aha, you're right, rand_core.crate, which has no examples, is reproducible. (Interestingly, libc.crate also doesn't have examples, but isn't reproducible because vxworks/mod.rs is marked as executable in the git repository but not the official crate file. Weird.)

That still leaves the question of how to handle this. The current practice of shipping non-version-controlled Cargo.lock files for libraries with examples isn't great; you might get lucky and get something meaningful if the developer did their packaging from the same checkout they'd been working in, but that goes out the window if the packaging is done on another machine (or CI). I think the options now are: