rust-lang / cargo

The Rust package manager
https://doc.rust-lang.org/cargo
Apache License 2.0
12.76k stars 2.42k forks source link

Allow registry configuration for disabling checksum validation #13858

Open stefanvanburen opened 6 months ago

stefanvanburen commented 6 months ago

Problem

I'm working on a crate registry where I'm able to advertise a set of crate versions that may be generated, but the crates themselves are only lazily generated when they're requested — so I don't have a checksum to publish initially. After the generation occurs, I can publish the checksum, and it is not expected to change.

Currently, cargo will not function if the checksum is not populated — it'll attempt to compare the empty string to the checksum of the downloaded crate, which will fail and stop the installation of the crate. I'd like to not need to pre-generate all possible versions just to populate the version checksums in the index.

Proposed Solution

I'd like the ability to specify, at a registry level, to disable checksum validation altogether. I'd imagine this looking is something like:

# in .cargo/config.toml
[registries]
my-registry = { verify-checksums = "no" }

Where the default value for verify-checksums is "yes".

Alternative solution

An alternative would be the ability to specify only validating a downloaded crate's checksum when a value is set in the registry (somewhat similar to how the Go Module Proxy works, where the initial module download populates the checksum value). I'd imagine this looking is something like:

# in .cargo/config.toml
[registries]
my-registry = { verify-checksums = "if-non-empty" }

Notes

I'm aware that this can introduce security risks (you should only enable this if you trust the registry that you're using), but I've been able to implement a similar lazy generation scheme across a variety of package ecosystems — cargo is the only one (so far) that has strictly required package version checksums in this way.

The alternative proposal is more secure in the sense that the checksum can at least be compared to a stable value once it's known, but still requires trust in the registry.

I wasn't able to find a prior request quite like this, but somewhat related is https://github.com/rust-lang/cargo/issues/10071. It also may allow for solutions to issues like https://github.com/rust-lang/cargo/issues/10939, where a local registry is required to serve a non-existent checksum.

weihanglo commented 6 months ago

I'm working on a crate registry where I'm able to advertise a set of crate versions that may be generated, but the crates themselves are only lazily generated when they're requested.

Not really related to the issue itself, but I'd like to know more about use cases and rationale behind this lazy generated package registry, if you'd like to share :)

weihanglo commented 6 months ago

I've been able to implement a similar lazy generation scheme across a variety of package ecosystems — cargo is the only one (so far) that has strictly required package version checksums in this way.

Could you give some references about how other ecosystems handle this?

stefanvanburen commented 6 months ago

Hey @weihanglo, happy to give more context. I'm working on Generated SDKs at Buf, where we allow users to easily depend on the output of their protobuf files when generated with various language plugins. (You can see that we already support Go, NPM, Maven, Swift and Python.)

The way this works is that we construct a synthetic version that combines information about the version of the files with the version of the plugin, so we have the cross product of (module versions) * (plugin versions) available for download. As you can imagine, as the number of versions of the protobuf files and the number of plugin versions increases, the number of available versions becomes quite large 😅.

The way that we typically deal with this is to only lazily generate the package when it's requested, and then store it for future downloads. This means that we don't have the checksum on hand before the package itself is requested the first time.

Here's a survey of how this works across the ecosystems we support:

Overall, we're looking to find a way to relax the requirement on checksums being populated in the index; we'd prefer to not need to generate crates that no package registry client has actually requested, just so we can populate their checksum in our index.

Let me know if I can provide any more detail on the above, and happy to provide implementation support for this (although my Rust skills are ... rusty 😄).

stefanvanburen commented 6 months ago

hey @weihanglo, any other details I can provide or anything I can help with? Wanted to make sure this didn't slip off your radar 😃, appreciate you taking a look.

Eh2406 commented 6 months ago

The actual PR Diff is likely to be small. Simply changing a == requirement to a is_none() || ==. Most of the change will be the additional tests. As with most security-related requests the hard part is going to be determining and documenting all of the implications. (Not having consulted the rest of the team) this probably requires an RFC. Mostly to ensure that all the details and there implications are documented and reviewed by the correct people.

Some important questions off the top of my head:

What are the implications of each of those decisions on the security of rust users? Keeping in mind both that a trusted registry might have been compromised or may have a MITM attack, but on the other hand that a registry is fundamentally a RCE-as-a-service. I'm not seeing anything that would be a dealbreaker, just a lot of things that need to be figured out and documented so that we are not surprised by them later on.

stefanvanburen commented 5 months ago

Thanks for getting back to me, and sorry for the delay!

probably requires an RFC

I've not been through the RFC process before, so I'm assuming this would be via the https://github.com/rust-lang/rfcs repository?

A few quick responses to your immediate questions:

What happens if a registry adds or removes the configuration field?

I don't think there would be any special handling: if it's enabled, checksums aren't validated. If it's disabled, checksums are validated. This may force a re-fetch of the index to grab nonexistent checksums.

What happens if source replacement is involved?

I'm not terribly familiar with source replacement, but I would think that if a [source] had a replace-with to a registry, it would inherit the configuration from that registry.

How does this interact with registry signing?

I view this as orthogonal to registry signing — I would think that this option would only apply to checksums, and not to the verification of a signed artifact.

What happens if registry removes the hash for an existing crate?

Assuming the configuration is enabled, nothing — it's just no longer verified locally. Maybe related to your next question?:

What happens if a registry changes the contents of a crate without having a hash?

Assuming the user has a populated Cargo.lock with the hash of the downloaded crate, cargo could warn in this case, letting the user know that the checksum changed. (I'd have to confirm, but I believe this is how other registry clients behave in this instance, when a lockfile disagrees with the hash of a download.)

I'm not a security expert, so take these with a grain of salt, just wanted to give my initial thoughts.