ossf / scorecard

OpenSSF Scorecard - Security health metrics for Open Source
https://scorecard.dev
Apache License 2.0
4.63k stars 503 forks source link

BUG: SAST and CI-Tests wrongly succeed when action is enabled #1420

Open laurentsimon opened 2 years ago

laurentsimon commented 2 years ago

SAST and CI-Test use the PR.App.Slug field to determine if a SAST/CI tool is used. When running scorecard's GitHub action, we detect our own action as a SAST/CI tool.

We should exclude it. The URL can be used to determine the name of the tool run, e.g., https://api.github.com/repos/laurentsimon/scorecard-action-test-3/check-runs/4143313445

laurentsimon commented 2 years ago

@azeemsgoogle is this something you'd like to add to the repo interface, i.e. a way to retrieve the tool name from the run/Slug?

rmjspaz commented 2 years ago

@azeemsgoogle is thi something you'd like to add to the repo interface, i.e. a way to retrieve the tool name from the run/Slug?

U

rmjspaz commented 2 years ago

@azeemsgoogle is thi something you'd like to add to the repo interface, i.e. a way to retrieve the tool name from the run/Slug?

U

azeemshaikh38 commented 2 years ago

It seems non-intuitive to me that users can't improve their SAST/CI scores by enabling Scorecard actions. We shouldn't ignore Scorecard action in these checks IMO.

If these checks seem to "wrongly succeed", then I would argue that the way we dole out points in these checks is where the problem lies. Maybe we should re-think the scoring system in these checks instead?

laurentsimon commented 2 years ago

It seems non-intuitive to me that users can't improve their SAST/CI scores by enabling Scorecard actions. We shouldn't ignore Scorecard action in these checks IMO.

good idea. I initially thought scorecard should be running silently in the background and not affect the results. But you make a good point.

If these checks seem to "wrongly succeed", then I would argue that the way we dole out points in these checks is where the problem lies. Maybe we should re-think the scoring system in these checks instead?

I agree the scoring system is not great for those checks. I think it requires more information about SAST tools. Originally I wanted to classify what features or vulnerability classes each tool can find, but this seems too ambitious at this point. A more accessible scoring could be:

  1. presence of linter https://github.com/ossf/scorecard/issues/1380: give 2-3 points
  2. presence of supply-chain tool: give 2-3 points
  3. language-specific tool: give remaining points, possibly using https://github.com/ossf/scorecard/issues/1427

Until we get there, we could maybe start by implementing point 2, ie give 3 points for scorecard as supply-chain tool. wdut?

For CI-Tests check, I'm not sure what we should do. The check is currently coarse . Maybe we can keep it as-is for the time being?

azeemshaikh38 commented 2 years ago

Discussed offline: can update the SAST check to use ListSuccessfulWorkflowRuns API. Basically, find all SAST related workflows and check for successful runs of this workflow.

laurentsimon commented 2 years ago

I did some digging and I don't think listing workflow runs works. The SAST check uses the Check API because it allows listing not just workflow runs, but also GitHub apps. See doc in https://docs.github.com/en/rest/reference/checks You can create apps that perform continuous integration, code linting, or code scanning services and provide detailed feedback on commits

laurentsimon commented 2 years ago

updates: GitHub runs apps. Some apps include lgtm-com, sonarcloud, github-actions, github-code-scanning and can be identified by their slug or their app ID. For GitHub-owned apps like GitHub actions, the workflow and job name is set by users: in this case we can parse the action and look for the action used + successful workflows, as suggested in https://github.com/ossf/scorecard/issues/1420#issuecomment-1005148323.

I'll start a PR that follows this structure: https://github.com/ossf/scorecard/issues/1420#issuecomment-1005080656 I'll wait till https://github.com/ossf/scorecard/issues/1451 is resolved to implement the https://github.com/ossf/scorecard/issues/1420#issuecomment-1005148323

laurentsimon commented 2 years ago

there is an additional complication I've encountered: ListSuccessfulWorkflowRuns returns as per commit runs. If a PR contains N commit, the runs will be returned for these N commits, then for the commits in the next PR, etc. The API allows up to 100 results per page, but it's not straightforward

azeemshaikh38 commented 2 years ago

Hmm, tricky. The API allows us to filter by branch and event too. Wonder if that can help. Exploring the REST and GraphQL APIs bit more, will update here if I find anything.

laurentsimon commented 2 years ago

an alternative (first iteration) solution we may consider is to identify workflows by their names, and map these names to the check run's name. So we would:

  1. list workflow names for each actions of interest
  2. compare the names to check runs' names for each PR

There are corner cases, e.g. if a dev gives the exact same name to 2 different workflow, or if someone has update the workflow name... But this may be a good first step to improve the check until we're able to get workflow runs for a particular PR. wduy?

laurentsimon commented 2 years ago

follow-up on my last comment. The name in the check run is Job Name if -name is present, jobname otherwise:

jobs:
  jobname:
  -name: "Job Name"

I've checked the partnered workflows for code scanning https://github.com/actions/starter-workflows/blob/main/code-scanning and found that

  1. A large number of workflows don't set the name field, so it uses jobname
  2. in the majority of cases, the names is a distinct string, but in rare cases it uses a generic term like build: not ideal

On a related note: this list of scanning tools may be a good list to support

laurentsimon commented 2 years ago

I think I found a way to do this. Check runs have a check_suite_id, and it's possible to list workflow runs by check_suite_id as well: this gives us the workflow ID which we can map to a file - and therefore an action. I'll work on a PR to see if it actually works or not ... and see how many additional API calls it requires. I suspect it needs an additional call per action run, so an additional number_workflow_run_on_prs*number_prs = number_workflow_run_on_prs*30