should the source track focus on "proposed changes" or "sequences of revisions?"

zachariahcox commented 4 months ago

What do we store?

Mechanically, git tracks revisions of a whole file system not sequences of changes (patches) to that system.

Should all the slsa source track wording focus on the diff between two revisions or "moving the branch to a proposed revision?"

What was reviewed?

This diff between the tip of the topic ref and the result of something like git merge-base target topic is shown to reviewers to help them focus on the net differences between the two revisions.

If they approve the changes they were shown, a new revision will be created that is morally equivalent to what they approved (maybe we would say it is "mergesame" or "diffsame"), but they were not necessarily able to review the final proposed revision.

This is an important aspect of merge queues and loosely related to merge associativity problems I guess.

Pull requests show reviewers a diff that can be described by three refs: the tips of target and topic and the current best-merge-base between them at the time the reviewer loads the page.

Observations

Review decisions are made based on non-git artifacts (the results of a specific diff operation). Those artifacts will become lost to time or garbage collection unless we keep the pull request metadata around as evidence, and even if we do, the commits referenced may also be garbage collected if they become unreachable from any ref.

TomHennen commented 3 weeks ago

I think we may have come to resolution on this. Right now we talk about "revisions" which seems the most natural for the current 3 levels. The only place it might be an issue is if we call out how to handle code review/changes in provenance, but we currently leave that up to implementations. So I propose we close this?

zachariahcox commented 3 weeks ago

@darylcantrell and I were thinking through this a bit recently.

Some kinds of checks (like unit tests) make claims about the whole revision (the after-oid). You mostly only need one set of those.

Others (like code review) make a claim about diffs shown. Reviewing a pr does not mean the whole codebase was reviewed so you need to sew a bunch together.

A release may contain a long series of relevant pr claims (ie: the set since the last release) but only the most recent "whole revision" claims matter.

...not really sure what to do with this observation, just that it will depend on the rule you're trying to aggregate what kind of attestations you'll have to dig through. And that we probably have to keep the diff concept in there.

TomHennen commented 2 weeks ago

I think what we can probably do with this observation is close this issue and leave it up to attestation implementor to figure out what they need? It sounds like for the Source Track levels revision is what we want, but don't want to preclude people from using other things in source provenance if needed. So better to leave it as is?

adityasaky commented 2 weeks ago

I think it might be okay to close as we've settled that levels pertain to consumable revisions. I think #1161's resolution will address this on a per-predicate basis.

zachariahcox commented 6 days ago

I think we're settling on "slsa data pertains to consumable revisions."

If you need to sew together the claims attached to a sequence of revisions (such as when building a contributors list) a VSA must be used to aggregate data from the per-revision attestations.

This result seems consistent with the approach we've taken elsewhere in the spec.

slsa-framework / slsa