ossf / scorecard

OpenSSF Scorecard - Security health metrics for Open Source
https://scorecard.dev
Apache License 2.0
4.38k stars 481 forks source link

BUG: invalid number of commits reported #2016

Open laurentsimon opened 2 years ago

laurentsimon commented 2 years ago

Running scorecard on https://github.com/danielaparker/jsoncons/commits/master reports

"Info: all commits (3) are checked with a SAST tool

Thera are a lot more commits in the repo, though. I would have expected Scorecard to look at 30 commits (at least that's how it used to work)

laurentsimon commented 2 years ago

I'm running at HEAD. This seems to be a problem on most repos I'm testing. Here's another one github.com/Dart-Code/dart-code, reporting a single commit.

spencerschrock commented 2 years ago

Scorecard still looks at 30 commits (or whatever commitsToAnalyze is set to). The number being reported refers to the number of commits in the 30 pulled which have a PR with non-zero merge time. This check changed in eeb563be, which may have unintentionally modified the behavior.

if pr.MergedAt.IsZero() {
    continue
}
totalMerged++

The exact numbers depend on the repo when run. When I run against the repo you mentioned today, 30 commits are analyzed and 14 aren't, so I get the following:

all commits (16) are checked with a SAST tool

Looking back in time to when you made the issue, there were a lot of commits which didn't have associated PR's (or at least nothing shows up after master like it does for other commits: one two etc ...

I think clarifying expected behavior is good here. Should we expect Scorecard to look at 30 commits overall, or 30 commits which get included in the analyses?

laurentsimon commented 2 years ago

30 commits overall is fine, and that's what we've been doing. I think I got thrown off by the message. It should say X out of Y commits are checked with a SAST tool

I recall there were some problems with graphQL where we may be looking at fewer (than 30 ) commits if there are not enough commits in the lest 3 (?) months or so.

@azeemsgoogle do you remember the exact problem and whether it has been resolved?

azeemshaikh38 commented 2 years ago

IIUC, the problem here is that some repos do not squash when they merge a PR. So when a PR with 5 commits gets merged, each of those commits are returned in ListCommits. But the number of PRs associated with these commits is just 1. So checks like SAST, CI-Tests which care about PRs rather than commits seem off.

This is actually a bug IMO (like if a PR had 30 commits, we'd only be analyzing one PR) and we need to fix it. One soln I can think of is to increase commitsToAnalyze to 100 and ignore any commit which does not match its associatedPR's merge commit. Its not foolproof, but probably better than what we have today. Thoughts?

laurentsimon commented 2 years ago

Tough one. Can the "squashed" status be queried?

azeemshaikh38 commented 2 years ago

Can the "squashed" status be queried?

Not AFAIK

laurentsimon commented 2 years ago

:/ If we find that we're not analyzing enough commits, can we re-query? That starts to become a bit complicated, though

azeemshaikh38 commented 1 year ago

This issue seems keeps creeping up. So here's my proposal to fix this:

  1. Update RepoClient.ListCommits() API to return an CommitIter object instead of []Commits.
  2. Every Scorecard check chooses how many commits it cares about analyzing. So in case of Code-Review or CI-Tests checks it'll keep retrieving commits until it reaches 30 (or the required) changesets.

So basically, instead of RepoClient choosing how many commits/PRs to analyze, it'll be the checks themselves.