shissam commented 2 years ago

Is your feature request related to a problem? Please describe. Scorecard exits with an exit status of zero (0) under many circumstances: for instance when there are internal errors.

Scripts and pipeline which use scorecard cannot rely on the exit status when such errors occur.

This can be repeated in situations when scorecard encounters (primary) API rate limits or secondary rate limits which actually impact the aggregate score (which itself is an issue). But this is not the only situation--this also occurs when the API returns other codes (like 400).

These are json excerpts which have occurred yet scorecard exists with zero (0).

"reason": "internal error: internal error: Client.Search.Code: Search.Code: GET https://api.github.com/search/code?q=github.com+lib+pq+repo%3Agoogle%2Foss-fuzz+in%3Afile+filename%3Aproject.yaml: 403 You have exceeded a secondary rate limit. Please wait a few minutes before you try again. []",
"score": -1

"reason": "internal error: Client.Repositories.ListStatuses: internal error: ListStatuses: GET https://api.github.com/repos/klauspost/cpuid/commits/1a4ea4da476f4e8e458f01a6f629a2cc89bc3027/statuses: 400  []",
"score": -1

"reason": "internal error: Client.Checks.ListCheckRunsForRef: internal error: ListCheckRunsForRef: GET https://api.github.com/repos/klauspost/cpuid/commits/7185e1a0afa86733f29f5a476fd969fc81766853/check-runs: 400  []",
"score": -1

It should be noted here that some of the 400 responses, when scorecard is repeated after seeing the 400, it may not occur again.

Describe the solution you'd like scorecard should exit with a non-zero (0) status when internal errors and other anomalies are encountered. exit status of zero should be reserved for successful runs.

Describe alternatives you've considered Searching the results from scorecard for "known" anomalies and ignoring the exit status. Such as searching for values like "internal error" in the reason tag/key.

Additional context Add any other context or screenshots about the feature request here. On the day scorecard was run against a specific repo, the secondary rate limit kicked in and resulted in a different aggregate score for github.com/lib/pq:

Also - please notice below that because of the "internal error" the aggregate score has been tainted.

$ jq -r '(.repo.name)' ../scw/pq.*json
github.com/lib/pq
github.com/lib/pq

$ jq -r '(.score)' ../scw/pq.*json
5.6
5.3

$ diff ../scw/pq.*json
90,91c90,91
<             "reason": "internal error: internal error: Client.Search.Code: Search.Code: GET https://api.github.com/search/code?q=github.com+lib+pq+repo%3Agoogle%2Foss-fuzz+in%3Afile+filename%3Aproject.yaml: 403 You have exceeded a secondary rate limit. Please wait a few minutes before you try again. []",
<             "score": -1
---
>             "reason": "project is not fuzzed",
>             "score": 0
200c200
<     "score": 5.6,
---
>     "score": 5.3,
203c203
<         "version": "v4.4.0-60-g69eb1cc-dirty"
---
>         "version": "v4.4.0-60-g69eb1cc"

"version": "v4.4.0-60-g69eb1cc-dirty" is the native go binary for scorecard clone/built from GitHub; while "version": "v4.4.0-60-g69eb1cc" is the gcr.io/openssf/scorecard:latest from 4 days ago

$ sudo docker run gcr.io/openssf/scorecard:latest version ./scorecard: Security Scorecards

GitVersion: v4.4.0-60-g69eb1cc GitCommit: 69eb1ccf1d0cf8c5b291044479f18672bf250325 GitTreeState: clean BuildDate: 1659268796 GoVersion: go1.16.7 Compiler: gc Platform: linux/amd64

$ ./scorecard version ./scorecard: Security Scorecards

GitVersion: v4.4.0-60-g69eb1cc-dirty GitCommit: 69eb1ccf1d0cf8c5b291044479f18672bf250325 GitTreeState: dirty BuildDate: 1659268796 GoVersion: go1.18.4 Compiler: gc Platform: linux/amd64

$ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.4 LTS"

laurentsimon commented 2 years ago

@spencerschrock this seems to fall under the general theme of "making scorecard a better dependency / library". Do you want to take a stab at it?

spencerschrock commented 2 years ago

Sure, I can take a look. The exit status should be an easy fix.

The impact of internal errors on scores is something I will defer to you @laurentsimon. Is this intended behavior when an API limit is hit (or for any internal error)? Should the whole run be considered an invalid result? Is it up to the consumer? WDUT?

I think an error is different than when there's not enough info. Perhaps we should come up with a different score for the two scenarios? We already have two functions for it:

// CreateInconclusiveResult is used when
// the check runs without runtime errors, but we don't
// have enough evidence to set a score.
func CreateInconclusiveResult(name, reason string) CheckResult {

// CreateRuntimeErrorResult is used when the check fails to run because of a runtime error.
func CreateRuntimeErrorResult(name string, e error) CheckResult {

spencerschrock commented 2 years ago

2133 does not address the scoring inconsistencies, but at least returns a non-zero exit code

shissam commented 2 years ago

does not address the scoring inconsistencies

would y'all suggest (then) that I create another issue regarding "scoring inconsistencies"?

thinking more about this, is this: if in my pipeline I get a non-zero exit, the tendency would be to ignore the (aggregate) score and (possibly) re-run scorecard. The corollary being: a zero status means (for me) that I can use the (aggregate) score in downstream analysis/risk activities.

I would be interested in hearing another perspectives--perhaps re-aligning my thinking!

laurentsimon commented 2 years ago

I think non-zero value means the run failed, which means the results can be discarded. I agree with your thinking. @spencerschrock did you have something in mind?

shissam commented 2 years ago

just one more "piece" for the calculus here.

Interesting in that above, where the internal error did occur resulting in a "local" score of -1 (negative one) for fuzzing, the aggregate score improved from 5.3 (where the fuzzing test deduced no fuzzing was in use) to an inflated 5.6 (with the internal error).

Now that scorecard indicates an error (non-zero exit) should make this point moot--but if (seemingly) any check returned a -1 could result in an unexpected aggregate score inflation (perhaps this is an orthogonal concern)

I look forward to re-running my regression

spencerschrock commented 2 years ago

The corollary being: a zero status means (for me) that I can use the (aggregate) score in downstream analysis/risk activities.

👍

non-zero value means the run failed, which means the results can be discarded

I think this makes sense. But will it be confusing to users who don't check exit codes? Should we be returning anything when a runtime error occurs? For example, in Go it's idiomatic to zero out other return values if there is an error.

tendency would be to ignore the (aggregate) score

I kept the existing behavior (outputting any results even with runtime errors) mainly out of backwards compatibility, but this is one argument for why returning partial results makes sense: individual scores could still be used

shissam commented 2 years ago

in Go it's idiomatic to zero out other return values if there is an error.

if you follow the Go tutorial at https://www.digitalocean.com/community/tutorials/handling-errors-in-go the same is explained stating (in part): "When returning an error from a function with multiple return values, idiomatic Go code also will set each non-error value to a zero value." going on to state "0 for integers" (I think I read that correctly)

So in the case above where there was an error (an internal error) for the fuzzing check:

< "reason": "internal error: internal error: Client.Search.Code: Search.Code: GET https://api.github.com/search/code?q=github.com+lib+pq+repo%3Agoogle%2Foss-fuzz+in%3Afile+filename%3Aproject.yaml: 403 You have exceeded a secondary rate limit. Please wait a few minutes before you try again. []", < "score": -1

Such a specification would seem to say, in this case for the fuzzing check that 0 should be returned rather than -1 (with an error returned). Again, in this case, then, I suspect with the error returned and the score (then) zeroed the aggregate score would not have been inflated.

I kept the existing behavior (outputting any results even with runtime errors) mainly out of backwards compatibility, but this is one argument for why returning partial results makes sense: individual scores could still be used

But as you rightly indicated, this is a change in behavior. I suspect that is because for scoring 0 is a valid (and determinate) score as being the lowest score in a valid range. So this is a bit of a poser (puzzling question): Go (pun unintended) against idiomatic Go and preserve existing behavior for those test that do not test for error conditions (on exit) and maintain backwards compatibility OR (one alternative) indicate that checks that do error out (resulting in non-zero exit status) are skipped or otherwise NOT included in the aggregate score--but this would most likely break backwards compatibility.

It does seems strange, though, that an error like seen in the fuzzing check, would actually improve a project's overall aggregate score. In the least, the erroneous score(s) could be excluded from the aggregate score, and non-zero exit condition would be signaled (new feature), and the "score": -1 could stand with the stated "reason":

(just a thought)

spencerschrock commented 2 years ago

Since the aggregate score would be invalid due to the runtime error, the "zero-value" I had imagined was at the entire scorecard level, meaning no JSON (or other format) output at all. Something that heavy-handed would probably need to be discussed in the biweekly OSSF Scorecard meeting.

scoring 0 is a valid (and determinate) score as being the lowest score in a valid range

Yes, that was the reason for InconclusiveResultScore = -1, which in my opinion makes it a more appropriate zero-value despite the obvious name mismatch.

Go (pun unintended) against idiomatic Go

pun appreciated

shissam commented 2 years ago

I hope to sit in on the next biweekly - when ready, I can re-run my regressions.

ossf / scorecard

Feature scorecard exits with non-zero status when internal errors occur #2124

2133 does not address the scoring inconsistencies, but at least returns a non-zero exit code