Allow registry configuration for disabling checksum validation

stefanvanburen commented 6 months ago

Problem

I'm working on a crate registry where I'm able to advertise a set of crate versions that may be generated, but the crates themselves are only lazily generated when they're requested — so I don't have a checksum to publish initially. After the generation occurs, I can publish the checksum, and it is not expected to change.

Currently, cargo will not function if the checksum is not populated — it'll attempt to compare the empty string to the checksum of the downloaded crate, which will fail and stop the installation of the crate. I'd like to not need to pre-generate all possible versions just to populate the version checksums in the index.

Proposed Solution

I'd like the ability to specify, at a registry level, to disable checksum validation altogether. I'd imagine this looking is something like:

# in .cargo/config.toml
[registries]
my-registry = { verify-checksums = "no" }

Where the default value for verify-checksums is "yes".

Alternative solution

An alternative would be the ability to specify only validating a downloaded crate's checksum when a value is set in the registry (somewhat similar to how the Go Module Proxy works, where the initial module download populates the checksum value). I'd imagine this looking is something like:

# in .cargo/config.toml
[registries]
my-registry = { verify-checksums = "if-non-empty" }

Notes

I'm aware that this can introduce security risks (you should only enable this if you trust the registry that you're using), but I've been able to implement a similar lazy generation scheme across a variety of package ecosystems — cargo is the only one (so far) that has strictly required package version checksums in this way.

The alternative proposal is more secure in the sense that the checksum can at least be compared to a stable value once it's known, but still requires trust in the registry.

I wasn't able to find a prior request quite like this, but somewhat related is https://github.com/rust-lang/cargo/issues/10071. It also may allow for solutions to issues like https://github.com/rust-lang/cargo/issues/10939, where a local registry is required to serve a non-existent checksum.

weihanglo commented 6 months ago

I'm working on a crate registry where I'm able to advertise a set of crate versions that may be generated, but the crates themselves are only lazily generated when they're requested.

Not really related to the issue itself, but I'd like to know more about use cases and rationale behind this lazy generated package registry, if you'd like to share :)

weihanglo commented 6 months ago

I've been able to implement a similar lazy generation scheme across a variety of package ecosystems — cargo is the only one (so far) that has strictly required package version checksums in this way.

Could you give some references about how other ecosystems handle this?

stefanvanburen commented 6 months ago

Hey @weihanglo, happy to give more context. I'm working on Generated SDKs at Buf, where we allow users to easily depend on the output of their protobuf files when generated with various language plugins. (You can see that we already support Go, NPM, Maven, Swift and Python.)

The way this works is that we construct a synthetic version that combines information about the version of the files with the version of the plugin, so we have the cross product of (module versions) * (plugin versions) available for download. As you can imagine, as the number of versions of the protobuf files and the number of plugin versions increases, the number of available versions becomes quite large 😅.

The way that we typically deal with this is to only lazily generate the package when it's requested, and then store it for future downloads. This means that we don't have the checksum on hand before the package itself is requested the first time.

Here's a survey of how this works across the ecosystems we support:

Go has the module proxy, which has the centralized sum.golang.org checksum database that will store the checksum on the package on first request (no need to include it in an index)
NPM allows for package versions to specify a shasum field, but it is not required to be set. (It's on our roadmap to include the hashes for the files that we have generated.) Notably, we ran into an issue with a particular set of versions of yarn that required the shasum field to be set, but yarn reversed course on that decision in future versions.
Maven repositories will store the hashes of the artifacts "next to" the artifacts themselves (see https://repo1.maven.org/maven2/build/buf/connect-kotlin/0.1.9/); downloading those files will retrieve the checksum (but checksums aren't displayed in an index)
Swift ... I have the least experience with, but importantly does not require that the index of package versions include a checksum
Python specifies that hrefs to files should include a hash, but does not require their presence. (It's on our roadmap to include the hashes for the files that we have generated.)

Overall, we're looking to find a way to relax the requirement on checksums being populated in the index; we'd prefer to not need to generate crates that no package registry client has actually requested, just so we can populate their checksum in our index.

Let me know if I can provide any more detail on the above, and happy to provide implementation support for this (although my Rust skills are ... rusty 😄).

stefanvanburen commented 6 months ago

hey @weihanglo, any other details I can provide or anything I can help with? Wanted to make sure this didn't slip off your radar 😃, appreciate you taking a look.

Eh2406 commented 6 months ago

The actual PR Diff is likely to be small. Simply changing a == requirement to a is_none() || ==. Most of the change will be the additional tests. As with most security-related requests the hard part is going to be determining and documenting all of the implications. (Not having consulted the rest of the team) this probably requires an RFC. Mostly to ensure that all the details and there implications are documented and reviewed by the correct people.

Some important questions off the top of my head:

What happens if a registry adds or removes the configuration field?
What happens if source replacement is involved? (Possibly at the same time as we observe the field changing.)
How does this interact with registry signing? (A long desired, but not yet designed, feature.)
What happens if registry removes the hash for an existing crate?
What happens if a registry changes the contents of a crate without having a hash?

What are the implications of each of those decisions on the security of rust users? Keeping in mind both that a trusted registry might have been compromised or may have a MITM attack, but on the other hand that a registry is fundamentally a RCE-as-a-service. I'm not seeing anything that would be a dealbreaker, just a lot of things that need to be figured out and documented so that we are not surprised by them later on.

stefanvanburen commented 5 months ago

Thanks for getting back to me, and sorry for the delay!

probably requires an RFC

I've not been through the RFC process before, so I'm assuming this would be via the https://github.com/rust-lang/rfcs repository?

A few quick responses to your immediate questions:

What happens if a registry adds or removes the configuration field?

I don't think there would be any special handling: if it's enabled, checksums aren't validated. If it's disabled, checksums are validated. This may force a re-fetch of the index to grab nonexistent checksums.

What happens if source replacement is involved?

I'm not terribly familiar with source replacement, but I would think that if a [source] had a replace-with to a registry, it would inherit the configuration from that registry.

How does this interact with registry signing?

I view this as orthogonal to registry signing — I would think that this option would only apply to checksums, and not to the verification of a signed artifact.

What happens if registry removes the hash for an existing crate?

Assuming the configuration is enabled, nothing — it's just no longer verified locally. Maybe related to your next question?:

What happens if a registry changes the contents of a crate without having a hash?

Assuming the user has a populated Cargo.lock with the hash of the downloaded crate, cargo could warn in this case, letting the user know that the checksum changed. (I'd have to confirm, but I believe this is how other registry clients behave in this instance, when a lockfile disagrees with the hash of a download.)

I'm not a security expert, so take these with a grain of salt, just wanted to give my initial thoughts.

rust-lang / cargo