Open gburgessiv opened 1 year ago
What's left unsaid so far is that
semver@1.0.13
only has an existing audit forsafe-to-run
, so I will have to audit the entirety ofsemver
fordoes-not-implement-crypto
.
I'm not sure I understand what you mean by this. A delta audit is only ever auditing the specific delta, and not the entire history of the crate. If semver@1.0.13
has no exemption or audit for crypto-safe
, then a delta audit wouldn't cause the crate to start certifying for crypto-safe
, you'd need to also create a full-audit for semver@1.0.13
in addition, which would be separate.
I'm pretty sure we also won't suggest adding a delta-audit for crypto-safe
unless semver@1.0.13
at least supported that criteria (e.g. through an exemption)
I'm guessing what you mean by "only has an existing audit for safe-to-run
" is that the crate has an existing exemption for 1.0.13 to be crypto-safe
. This suggestion is continuing to rely on the exemption, meaning that semver@1.0.13
would be partially audited by the combination of a delta audit and an exemption.
Further - and this is probably more specific to
does-not-implement-crypto
- even ifsemver
had been properly audited fordoes-not-implement-crypto
before (orcrypto-safe
, whichdoes-not-implement-crypto
implies), I need to know which of those had been audited for so I can choose the correct one to "carry forward" in this diff audit.
This is quite does-not-implement-crypto
-specific, yes. We don't suggest that the audit always be for the full set of all possible criteria as someone only using a safe-to-run
requirement on a crate wouldn't want to be asked to certify for safe-to-deploy
just because a previous version was audited as safe-to-deploy
by someone (or worse asked to certify for e.g. google::does-not-implement-crypto
when they don't use that criteria).
We could potentially look into doing a full search of the criteria certified for the from node in addition to determining which criteria we want to support so that we can include more context about what criteria the base revision was certified for. Perhaps this could be behind another flag to avoid adding noise in situations where it's not desirable. Perhaps it'd look something like this (not sure how best to represent it?):
Satisfied criteria for 1.0.13:
fully audited: ["safe-to-run"]
partially audited: []
exempted: ["does-not-implement-crypto"]
This suggestion is continuing to rely on the exemption, meaning that semver@1.0.13 would be partially audited by the combination of a delta audit and an exemption
Ah, yes, this is exactly correct. I guess I've been thinking about exemptions as a hammer for TODOs rather than a more general ... well, exemption. My mistake :)
We don't suggest that the audit always be for the full set of all possible criteria as someone only using [...]
Totally agree - I wasn't trying to suggest that audits are run for every possible criteria; more that knowing the existing state of audits for prior versions is helpful when trying to audit a diff.
Perhaps it'd look something like this (not sure how best to represent it?):
I think this looks good! To your point earlier, maybe there's a case to be made for trimming information that cargo-vet
doesn't deem relevant to the audit that's likely to be performed? In other words, going back to my original example, cargo-vet
assumes that I'm going to audit for safe-to-run
and crypto-safe
. It could be helpful to ignore criteria which are unrelated to safe-to-run
and crypto-safe
when printing out the Satisfied criteria
block.
I think there's a pretty important UX issue here -- a tool that feels like it has good UX but actually requires manually reading many of the same files from which it pulls the data in the UI easily leads to assumptions and human mistakes. As an example, I was doing a delta review and looked into audit.toml to find the previous baseline for a crate. The deltas weren't in order and I looked at an older one, leading me to suggest ub-risk-1
instead of ub-risk-0
. If the tool had been clear about the baseline, that would not have happened. This happened to be a "safe" error.
Right now, cargo vet check
first displays a nice menu, then a text file with lots of useful comments. In https://chromium-review.googlesource.com/c/chromium/src/+/5553921/comment/d70a587a_0555c1c1/ I suggested an alternative formulation of the menu, specifically for a delta:
choose criteria to certify for somecrate:0.1.0 -> 0.2.0
1. safe-to-run
2. * safe-to-deploy
3. crypto-safe
4. * does-not-implement-crypto
5. * ub-risk-0
6. ub-risk-1
7. ub-risk-2
8. ub-risk-3
9. ub-risk-4
Delta Baseline: [safe-to-deploy, does-not-implement-crypto, ub-risk-0]
Required:
✓ safe-to-deploy
✓ crypto-safe (implied by does-not-implement-crypto)
✓ ub-risk-2 (implied by ub-risk-0)
keys here:
ub-risk-0
with a baseline of ub-risk-2
)I like the suggestion in https://github.com/mozilla/cargo-vet/issues/417#issuecomment-2130053997 and I tried to explore how to implement just the "defaults to the baseline" for cargo vet certify crate_foo <vOld> <vNew>
. (I agree with the previous comment that defaulting to the policy requirements is a bug, because https://mozilla.github.io/cargo-vet/audit-entries.html#delta says that a delta audit "certifies that the delta [...] preserves the relevant criteria" and notes that "it's not always possible to conclude that a diff preserves certain properties without also inspecting some portion of the base version".)
AFAICT do_cmd_certify
calls guess_audit_criteria
which calls compute_suggested_criteria
. I think that we could consider changing how Conclusion::FailForVet
is handled (updating the test expectations as needed - this seems to already have a nice test coverage via mock_simple_suggested_criteria
in src/tests/certify.rs
).
The main challenge here (at least for me) would be figuring out what exactly we mean by "baseline". This is complicated by the implies
relationship and potentially multiple delta audits. For example, consider:
strongly-reviewed
implies reviewed
and reviewed
implies weak-reviewed
(this is from the mock tests)third-party3
that has the following audits:
v2
: strong-reviewed
v2
=> v3
: strong-reviewed
v3
=> v4
: reviewed
weak-reviewed
v4
=> v5
audit:
reviewed
and weak-reviewed
- we need to decide which of these we should recommend as the auditweak-reviewed
(note that the next reversed-implies
edge goes to reviewed
)So, when cargo vet certify third-party3 v4 v5
realizes that the vet check fails for v5
, then what should it consider as the baseline? It seems to me that reviewed
is the right answer, right? But how would we find that algorithmically? I guess one answer would be that we find reviewed
by doing breadth-first-search from the failed criteria (weak-reviewed
), walking the reversed-implies
graph edges (e.g. weak-reviewed
=> reviewed
), and stopping when we find the criteria present in the baseline (reviewed
in this case). OTOH, this may get complicated if there are multiple distance-N criteria that are in the baseline... :-/
To my understanding, the suggestions here are not to change anything about when we're doing full audits, but rather to delta audits.
The suggestion appears to be that, when doing a delta audit, instead of suggesting the minimal set of criteria which will "heal" the audit graph based on the policy, we suggest the maximal set of criteria which would lead to the "to" version of the delta audit being certified for more criteria. So for example, if there were 4 criteria (A-D), and the base audit was certified for (A, B), we'd suggest a delta audit for (A, B)
, even if the certification requirements did not contain B
.
To me the concept of "baseline" in this scenario seems fairly clear. As there is an explicit base revision being checked, we can look at what criteria are satisfied for that crate, and suggest those criteria. This is roughly what I did in #614.
Is this correct? I want to be confident I'm understanding the requests from folks actually using custom criteria here.
So, when
cargo vet certify third-party3 v4 v5
realizes that the vet check fails forv5
, then what should it consider as the baseline? It seems to me thatreviewed
is the right answer, right?
Yes, this would be what would be found if we checked which criteria are satisfied for v4
as I mention above.
A small note from https://github.com/mozilla/cargo-vet/issues/417#issuecomment-2130053997 (emphasis mine)
- not visible in the example:
- rejects audits that do not meet requirements
- rejects delta audits that increase specificicity (if that's the right term -- for example selecting ub-risk-0 with a baseline of ub-risk-2)
It is fully OK to "over-specify" criteria for a delta audit (i.e. provide a delta audit which specifies more criteria than can actually be certified given the other audits which are present), though the final crate will still only certify for the base criteria. We can't be blocking/rejecting those, but cargo-vet will never suggest them.
For example, if you have criteria A, B
, and there is an audit (crate@1.0.0 : "A"
), and you're certifiying crate@1.0.0->2.0.0 : "A" + "B"
that is OK, but crate@2.0.0
will not satisfy "B"
, until someone makes a new audit certifying crate@1.0.0 : "B"
.
Taking the three sections of the previous comment in turn:
Yes, in my experience the most common delta reviews have no changes to the criteria. If crate@1.1.0
was reviewed last week with criteria (A, B), then most likely crate@1.2.0
this week is also (A, B). The current UX might instead default to (A, C), where B implies C. If I miss this as a reviewer then crate@1.2.0
is only certified for (A, C) when in fact the stronger (A, B) applies.
I am likely missing something, but I think the baseline for a delta audit should simply be the criteria of the "old" version, regardless of any implications. In @anforowicz's example, that would be reviewed
, as expected. Before that, cargo vet certify third-party3 v3 v4
would have had the baseline strong-reviewed
, and it seems the reviewer was not able to meet that criteria and weakened it to reviewed
.
This is news to me! Restating to check understanding, in @anforowicz's example, if I certified v4
=> v5
as strongly-reviewed
, that is allowed, but has the same effect as certifying for reviewed
, because that is the weakest criterion in the delta chain. What did the certifier mean when they typed strongly-reviewed
?? If they spent the time to strongly review the entire crate, then that effort is not reflected in the result. If they strongly reviewed only the diff, is that a conditional claim that "if the old version was certified as strongly-reviewed, then the new version is as well"?
I suppose rejecting such reviews is problematic because, among other reasons, it could mean that merging audit.toml's leads to a rejection.
I think it would be useful for
cargo vet diff
to surface information about previous audits of a given crate.For example, after running
cargo update
earlier,cargo vet
said that I needed to review an update ofsemver
from1.0.13
to1.0.16
. Runningcargo vet diff semver 1.0.13 1.0.16
gave me the following output:What's left unsaid so far is that
semver@1.0.13
only has an existing audit forsafe-to-run
, so I will have to audit the entirety ofsemver
fordoes-not-implement-crypto
.Further - and this is probably more specific to
does-not-implement-crypto
- even ifsemver
had been properly audited fordoes-not-implement-crypto
before (orcrypto-safe
, whichdoes-not-implement-crypto
implies), I need to know which of those had been audited for so I can choose the correct one to "carry forward" in this diff audit.Including something like the following in
cargo vet diff
output would helpful :)