Better behavior for not-yet-published audit-as-crates-io crates

bholley commented 1 year ago

In the last release, we added better heuristics for detecting potential matches between first-party and published crates. Such potential matches require audit-as-crates-io entries. In wasmtime, there were ~50 new entries.

These entries are new because wasmtime uses a branch-for-release-then-bump model, and so the local crates are always at least one version ahead of what's published on crates.io. This is a situation that we've struggled to handle well in cargo-vet. For git dependencies, we can require a precise audit chain all the way to the exact commit being imported. But for path dependencies, we currently just require audit-as-crates-io to be set to false if the local version is ahead of the published version.

The wasmtime case seems like motivation to do better here. In principle, I think the right way to support audit-as-crates-io in this situation is to require an audit for the most-recently-published version (that's also below the current version).

We could simply institute this behavior automatically (setting audit-as-crates-io=true causes the cargo-vet algorithm to just replace the explicit version of such a first-party crate with the most-recently-published version), though this would have the unfortunate effect of allowing a previously-passing cargo vet to start failing when a new version is published upstream.

I think we probably want to warn rather than error in that case, which means we need breadcrumbs of the previously-passing state. Not sure whether it's better to put those in the policy entry, in imports.lock, or if there's some more elegant situation.

@mystor WDYT?

mystor commented 1 year ago

In general, we use imports.lock as the file in which we store information like this cached from previous runs such as which imported audits and publisher information was required.

We could theoretically introduce a new table into that file, like unpublished, which documents that a particular version was unpublished, and that a different version was audited instead, like this:

[[unpublished.cranelift]]
version = "1.0.1"
audited-as = "1.0.0"

These would need to be tracked with our imports backend in a similar manner to other imported audits, and would effectively act as an implicit wildcard delta audit from 1.0.0 to 1.0.1 for the given crate as far as the resolver is concerned. We would create these entries (if one doesn't already exist) when running unlocked if a given crate is unpublished, but is marked as audit-as-crates-io, arbitrarily selecting the nearest version under the given version (or above if no version is found below?).

After the crate is published, cargo vet --locked commands would continue to succeed because of the existing unpublished entry, and would be pruned if a real audit was introduced to satisfy the new version. We could then make commands like cargo vet log warnings after being run if a crate was previously audited as a different version but has since been published or similar.

I don't love the extra complexity we're layering on top of the existing complexity of the import codepath, but I think it's probably the easiest way to make a feature like this work, and we've already put in most of the work for other types of "imports".

Does that sound like it would work as an approach and satisfy the wasmtime use-case?

bholley commented 1 year ago

@mystor Thanks, this design sounds right to me. Please make it happen.

mozilla / cargo-vet

Better behavior for not-yet-published audit-as-crates-io crates #495