FR: Surfacing prior certification information in `cargo vet diff` output

mozilla / cargo-vet

supply-chain security for Rust

Apache License 2.0

668 stars 48 forks source link

FR: Surfacing prior certification information in `cargo vet diff` output #417

Open gburgessiv opened 1 year ago

gburgessiv commented 1 year ago

I think it would be useful for cargo vet diff to surface information about previous audits of a given crate.

For example, after running cargo update earlier, cargo vet said that I needed to review an update of semver from 1.0.13 to 1.0.16. Running cargo vet diff semver 1.0.13 1.0.16 gave me the following output:

You are about to diff versions 1.0.13 and 1.0.16 of 'semver', likely to certify it for "safe-to-run", which means:

  This crate can be compiled, run, and tested on a local workstation or in
  controlled automation without surprising consequences, such as:
  * Reading or writing data from sensitive or unrelated parts of the filesystem.
  * Installing software or reconfiguring the device.
  * Connecting to untrusted network endpoints.
  * Misuse of system resources (e.g. cryptocurrency mining).

... and for "crypto-safe", which means:

  All crypto algorithms in this crate have been reviewed by a relevant expert.

  **Note**: If a crate does not implement crypto, use `does-not-implement-crypto`,
  which implies `crypto-safe`, but does not require expert review in order to
  audit for.
Please read the above criteria and consider them when performing the audit.
Other software projects may rely on this audit. Ask for help if you're not sure.

You can inspect the diff here: https://sourcegraph.com/crates/semver/-/compare/v1.0.13...v1.0.16

(press ENTER to open in your browser, or re-run with --mode=local)

What's left unsaid so far is that semver@1.0.13 only has an existing audit for safe-to-run, so I will have to audit the entirety of semver for does-not-implement-crypto.

Further - and this is probably more specific to does-not-implement-crypto - even if semver had been properly audited for does-not-implement-crypto before (or crypto-safe, which does-not-implement-crypto implies), I need to know which of those had been audited for so I can choose the correct one to "carry forward" in this diff audit.

Including something like the following in cargo vet diff output would helpful :)

Previously recorded certifications for 'semver':
  v1.0.13: ["safe-to-run"]
  v1.0.13+making-it-up-for-the-sake-of-this-example: ["safe-to-run", "does-not-implement-crypto"]

mystor commented 1 year ago

What's left unsaid so far is that semver@1.0.13 only has an existing audit for safe-to-run, so I will have to audit the entirety of semver for does-not-implement-crypto.

I'm not sure I understand what you mean by this. A delta audit is only ever auditing the specific delta, and not the entire history of the crate. If semver@1.0.13 has no exemption or audit for crypto-safe, then a delta audit wouldn't cause the crate to start certifying for crypto-safe, you'd need to also create a full-audit for semver@1.0.13 in addition, which would be separate.

I'm pretty sure we also won't suggest adding a delta-audit for crypto-safe unless semver@1.0.13 at least supported that criteria (e.g. through an exemption)

I'm guessing what you mean by "only has an existing audit for safe-to-run" is that the crate has an existing exemption for 1.0.13 to be crypto-safe. This suggestion is continuing to rely on the exemption, meaning that semver@1.0.13 would be partially audited by the combination of a delta audit and an exemption.

Further - and this is probably more specific to does-not-implement-crypto - even if semver had been properly audited for does-not-implement-crypto before (or crypto-safe, which does-not-implement-crypto implies), I need to know which of those had been audited for so I can choose the correct one to "carry forward" in this diff audit.

This is quite does-not-implement-crypto-specific, yes. We don't suggest that the audit always be for the full set of all possible criteria as someone only using a safe-to-run requirement on a crate wouldn't want to be asked to certify for safe-to-deploy just because a previous version was audited as safe-to-deploy by someone (or worse asked to certify for e.g. google::does-not-implement-crypto when they don't use that criteria).

We could potentially look into doing a full search of the criteria certified for the from node in addition to determining which criteria we want to support so that we can include more context about what criteria the base revision was certified for. Perhaps this could be behind another flag to avoid adding noise in situations where it's not desirable. Perhaps it'd look something like this (not sure how best to represent it?):

Satisfied criteria for 1.0.13:
  fully audited: ["safe-to-run"]
  partially audited: []
  exempted: ["does-not-implement-crypto"]

gburgessiv commented 1 year ago

This suggestion is continuing to rely on the exemption, meaning that semver@1.0.13 would be partially audited by the combination of a delta audit and an exemption

Ah, yes, this is exactly correct. I guess I've been thinking about exemptions as a hammer for TODOs rather than a more general ... well, exemption. My mistake :)

We don't suggest that the audit always be for the full set of all possible criteria as someone only using [...]

Totally agree - I wasn't trying to suggest that audits are run for every possible criteria; more that knowing the existing state of audits for prior versions is helpful when trying to audit a diff.

Perhaps it'd look something like this (not sure how best to represent it?):

I think this looks good! To your point earlier, maybe there's a case to be made for trimming information that cargo-vet doesn't deem relevant to the audit that's likely to be performed? In other words, going back to my original example, cargo-vet assumes that I'm going to audit for safe-to-run and crypto-safe. It could be helpful to ignore criteria which are unrelated to safe-to-run and crypto-safe when printing out the Satisfied criteria block.

djmitche commented 6 months ago

I think there's a pretty important UX issue here -- a tool that feels like it has good UX but actually requires manually reading many of the same files from which it pulls the data in the UI easily leads to assumptions and human mistakes. As an example, I was doing a delta review and looked into audit.toml to find the previous baseline for a crate. The deltas weren't in order and I looked at an older one, leading me to suggest ub-risk-1 instead of ub-risk-0. If the tool had been clear about the baseline, that would not have happened. This happened to be a "safe" error.

Right now, cargo vet check first displays a nice menu, then a text file with lots of useful comments. In https://chromium-review.googlesource.com/c/chromium/src/+/5553921/comment/d70a587a_0555c1c1/ I suggested an alternative formulation of the menu, specifically for a delta:

choose criteria to certify for somecrate:0.1.0 -> 0.2.0
  1.   safe-to-run
  2. * safe-to-deploy
  3.   crypto-safe
  4. * does-not-implement-crypto
  5. * ub-risk-0
  6.   ub-risk-1
  7.   ub-risk-2
  8.   ub-risk-3
  9.   ub-risk-4

Delta Baseline: [safe-to-deploy, does-not-implement-crypto, ub-risk-0]
Required:
  ✓ safe-to-deploy
  ✓ crypto-safe (implied by does-not-implement-crypto)
  ✓ ub-risk-2 (implied by ub-risk-0)

keys here:

displays the baseline
defaults to the baseline (IMHO defaulting to the requirements is a bug)
checks requirements, including implications, and clarifies those implications
not visible in the example:
- rejects audits that do not meet requirements
- rejects delta audits that increase specificicity (if that's the right term -- for example selecting ub-risk-0 with a baseline of ub-risk-2)

anforowicz commented 6 months ago

I like the suggestion in https://github.com/mozilla/cargo-vet/issues/417#issuecomment-2130053997 and I tried to explore how to implement just the "defaults to the baseline" for cargo vet certify crate_foo <vOld> <vNew>. (I agree with the previous comment that defaulting to the policy requirements is a bug, because https://mozilla.github.io/cargo-vet/audit-entries.html#delta says that a delta audit "certifies that the delta [...] preserves the relevant criteria" and notes that "it's not always possible to conclude that a diff preserves certain properties without also inspecting some portion of the base version".)

AFAICT do_cmd_certify calls guess_audit_criteria which calls compute_suggested_criteria. I think that we could consider changing how Conclusion::FailForVet is handled (updating the test expectations as needed - this seems to already have a nice test coverage via mock_simple_suggested_criteria in src/tests/certify.rs).

The main challenge here (at least for me) would be figuring out what exactly we mean by "baseline". This is complicated by the implies relationship and potentially multiple delta audits. For example, consider:

strongly-reviewed implies reviewed and reviewed implies weak-reviewed (this is from the mock tests)
third-party3 that has the following audits:
- full audit: v2: strong-reviewed
- delta audit: v2 => v3: strong-reviewed
- delta audit: v3 => v4: reviewed
Policy requirement for weak-reviewed
When contemplating a v4 => v5 audit:
- The baseline is V4, and it fulfills both reviewed and weak-reviewed - we need to decide which of these we should recommend as the audit
- The failed criteria is weak-reviewed (note that the next reversed-implies edge goes to reviewed)

So, when cargo vet certify third-party3 v4 v5 realizes that the vet check fails for v5, then what should it consider as the baseline? It seems to me that reviewed is the right answer, right? But how would we find that algorithmically? I guess one answer would be that we find reviewed by doing breadth-first-search from the failed criteria (weak-reviewed), walking the reversed-implies graph edges (e.g. weak-reviewed => reviewed), and stopping when we find the criteria present in the baseline (reviewed in this case). OTOH, this may get complicated if there are multiple distance-N criteria that are in the baseline... :-/

mystor commented 6 months ago

To my understanding, the suggestions here are not to change anything about when we're doing full audits, but rather to delta audits.

The suggestion appears to be that, when doing a delta audit, instead of suggesting the minimal set of criteria which will "heal" the audit graph based on the policy, we suggest the maximal set of criteria which would lead to the "to" version of the delta audit being certified for more criteria. So for example, if there were 4 criteria (A-D), and the base audit was certified for (A, B), we'd suggest a delta audit for (A, B), even if the certification requirements did not contain B.

To me the concept of "baseline" in this scenario seems fairly clear. As there is an explicit base revision being checked, we can look at what criteria are satisfied for that crate, and suggest those criteria. This is roughly what I did in #614.

Is this correct? I want to be confident I'm understanding the requests from folks actually using custom criteria here.

So, when cargo vet certify third-party3 v4 v5 realizes that the vet check fails for v5, then what should it consider as the baseline? It seems to me that reviewed is the right answer, right?

Yes, this would be what would be found if we checked which criteria are satisfied for v4 as I mention above.

A small note from https://github.com/mozilla/cargo-vet/issues/417#issuecomment-2130053997 (emphasis mine)

not visible in the example:

rejects audits that do not meet requirements

rejects delta audits that increase specificicity (if that's the right term -- for example selecting ub-risk-0 with a baseline of ub-risk-2)

It is fully OK to "over-specify" criteria for a delta audit (i.e. provide a delta audit which specifies more criteria than can actually be certified given the other audits which are present), though the final crate will still only certify for the base criteria. We can't be blocking/rejecting those, but cargo-vet will never suggest them.

For example, if you have criteria A, B, and there is an audit (crate@1.0.0 : "A"), and you're certifiying crate@1.0.0->2.0.0 : "A" + "B" that is OK, but crate@2.0.0 will not satisfy "B", until someone makes a new audit certifying crate@1.0.0 : "B".

djmitche commented 5 months ago

Taking the three sections of the previous comment in turn:

Yes, in my experience the most common delta reviews have no changes to the criteria. If crate@1.1.0 was reviewed last week with criteria (A, B), then most likely crate@1.2.0 this week is also (A, B). The current UX might instead default to (A, C), where B implies C. If I miss this as a reviewer then crate@1.2.0 is only certified for (A, C) when in fact the stronger (A, B) applies.

I am likely missing something, but I think the baseline for a delta audit should simply be the criteria of the "old" version, regardless of any implications. In @anforowicz's example, that would be reviewed, as expected. Before that, cargo vet certify third-party3 v3 v4 would have had the baseline strong-reviewed, and it seems the reviewer was not able to meet that criteria and weakened it to reviewed.

This is news to me! Restating to check understanding, in @anforowicz's example, if I certified v4 => v5 as strongly-reviewed, that is allowed, but has the same effect as certifying for reviewed, because that is the weakest criterion in the delta chain. What did the certifier mean when they typed strongly-reviewed?? If they spent the time to strongly review the entire crate, then that effort is not reflected in the result. If they strongly reviewed only the diff, is that a conditional claim that "if the old version was certified as strongly-reviewed, then the new version is as well"?

I suppose rejecting such reviews is problematic because, among other reasons, it could mean that merging audit.toml's leads to a rejection.