SAST not detected in CII Best Practices badge workflows

david-a-wheeler commented 3 years ago

Describe the bug SAST tools are not detected in the CII Best Practices badge code

Reproduction steps Steps to reproduce the behavior:

Run scorecard on https://github.com/coreinfrastructure/best-practices-badge

It reports:

    {
      "details": [
        "Warn: 0 commits out of 30 are checked with a SAST tool",
        "Warn: CodeQL tool not detected"
      ],
      "score": 0,
      "reason": "SAST tool is not run on all commits -- score normalized to 0",
      "name": "SAST"
    },

Expected behavior

First, the blanket claim SAST tool is not run on all commits is way too strong. It's hard to detect such tools in all cases, so the text here is extremely misleading. It should say something like:

Did not detect a SAST tool that is run on all commits

Second, there's an easy case Scorecard should detect:

In .github/workflows/main.yaml it uses Brakeman via this line:

        uses: devmasx/brakeman-linter-action@0dc80fcccf87915ccb1761669014015214f36287 # pin@v1.0.0

Brakeman within a GitHub workflow is easy to detect & is a security-focused SAST tool. That should certainly be detected.

FYI, most of the analysis of the badge is via CircleCI, not via GitHub workflows. In the long term you may want to add that analysis. Here's an example, for your amusement. Most of its static analysis tools are invoked via a system common Ruby program. In .circleci/config.yml there is this:

    - run:
        name:  Run pronto GitHub
        command: >
          bundle exec pronto run -f github text
          -c=$(git log --pretty=format:%H | tail -1) --exit-code

This runs all the "pronto" static checks. You can find out what they are by looking at gems with the name "pronto-..." in Gemfile, in this case:

  gem 'pronto-eslint', '0.11.0'
  gem 'pronto-rails_best_practices', '0.11.0'
  gem 'pronto-rubocop', '0.11.1'

ESlint and Rubocop are static analysis style checkers mainly, but they do have some security-related rules. rails_best_practices definitely has a number of security-related rules.

I mention all of this because it's a good example of why detecting static analysis tools across all projects is hard... and thus why the warning message should more clearly acknowledge that scorecard didn't find a tool instead of claiming that there is no tool.

laurentsimon commented 3 years ago

Examples of CI/CD to support (taken from another issue): .github/workflows (we support) .circleci .travis.yml .gitlab-ci.yml

evverx commented 3 years ago

.github/workflows (we support)

I wouldn't say this is fully supported. For example, systemd has a custom workflow sending data to CoverityScan daily. To get scorecard to handle workflows like that it should be changed so that it could analyze the status of GitHub workflows and scanners used to analyze projects. Other than that, as I mentioned in https://github.com/ossf/scorecard/pull/1232, the "last 30 commits" approach doesn't seem to cover the usual workflow where every PR is analyzed before getting merged. I think the check is misleading for the most part and probably it shouldn't even be used by default at least.

david-a-wheeler commented 3 years ago

@evverx: To be fair, @laurentsimon didn't say "fully supported" just "support".

It's challenging to handle all cases. Scorecard needs to keep getting refined to increasingly handle more cases over time. Originally I had proposed this issue to focus on a specific case. This issue could be broadened, or a separate issue could be created to discuss all the cases & then point to this issue as an example of a specific case. Either way works for me. Does somehow have a preference?

evverx commented 3 years ago

@evverx: To be fair, @laurentsimon didn't say "fully supported" just "support".

@david-a-wheeler fair enough.

Scorecard needs to keep getting refined to increasingly handle more cases over time.

Agreed. Though I think until the check can cover most use cases it shouldn't be rolled out by default. I'm not sure maintainers would be happy to receive reports based on the findings of scorecard run by people (or maybe even bots) who didn't even try to figure out what it does. There are already enough "bug reports" produced by bespoke static analysis tools (most of which are false positives of course).

laurentsimon commented 3 years ago

@evverx: To be fair, @laurentsimon didn't say "fully supported" just "support".

@david-a-wheeler fair enough.

Scorecard needs to keep getting refined to increasingly handle more cases over time.

Agreed. Though I think until the check can cover most use cases it shouldn't be rolled out by default. I'm not sure maintainers would be happy to receive reports based on the findings of scorecard run by people (or maybe even bots) who didn't even try to figure out what it does. There are already enough "bug reports" produced by bespoke static analysis tools (most of which are false positives of course).

Thanks for the comments. This is a double-edge sword. On one hand, you're right that users may be upset. On the other, not rolling it out prevents us from gathering improvement suggestions from early adopters.. before the tool reaches wider adoption.

We can do 2 things:

Instead of returning a score of 0, we can return an inconclusive result: we would just say "we could not figure it out". We already have support for this so it's easy to add.
Are you interested in creating a tracking issue and add thoughts on how we can improve the situation (I don't mind using this issue, it's up to you)? Something we've thought about before is to add support for parsing the commands used in run field of the workflow, in order to identify tools that are run without a GitHub action. I think this would complement the current approach. I suspect parsing the shell script (if runs invokes one) may probably be needed as well. There is a list of tools to support over time, possibly from these folks https://kompar.tools/ - but it takes time. Once we have the basic covered, it'd be easier to reach out to tool maintainers and ask them to add the name of their utilities.

Let us know what you think. Community feedback is key to improving scorecard.

evverx commented 3 years ago

Instead of returning a score of 0, we can return an inconclusive result: we would just say "we could not figure it out". We already have support for this so it's easy to add.

Sounds good to me. FWIW I think the Binary-Artifacts check should be demoted as well.

Are you interested in creating a tracking issue and add thoughts on how we can improve the situation

Unfortunately I can't say I know how to improve the situation. Parsing the run field and bash scripts could certainly help but given that some projects hosted on GitHub send data for analysis bypassing GitHub Actions altogether it doesn't seem to be enough. At the end of the day I think what matters is whether projects are analyzed successfully at least weekly and to find out whether SASTs work their project pages like https://lgtm.com/projects/g/systemd/systemd/history/ or https://scan.coverity.com/projects/systemd should be somehow taken into account.

laurentsimon commented 3 years ago

Thank you for the suggestion, that's really helpful!

david-a-wheeler commented 3 years ago

Another option would be to detect the presence of many CI/CD pipeline configuration files, and give either "we don't know" or give a little credit if Scorecards isn't parsing them. Parsing all such files is challenging, and determining what they do is even harder, but their complete lack of presence could be considered considered concerning.

laurentsimon commented 3 years ago

thanks all. So to summarize:

Look for config file for known CICD, as in https://github.com/ossf/scorecard/issues/1031#issuecomment-921042624
Look for testing services, as in https://github.com/ossf/scorecard/issues/1031#issuecomment-965867782
Look for specific commands in workflows
Look for known SAST tools from the market place in the PR check runs

evverx commented 3 years ago

I'd add that I think projects analyzed more frequently should have higher score. Currently the SAST check would rate projects analyzed by CodeQL, say, once a week higher than projects where every PR is analyzed before getting merged because it's much more likely that frequent workflows fail due to different transient issues related to the availability of SASTs for example. I don't think that's fair.

Parsing all such files is challenging, and determining what they do is even harder, but their complete lack of presence could be considered considered concerning

Looks like it would just turn the SAST check into the CI check (where as far as I understand the presence of various CI services is checked for the most part)

evverx commented 3 years ago

I kind of complained about weird results scorecard produces in https://github.com/ossf/scorecard/issues/1257#issuecomment-968355084

Anyway, on an unrelated note, somehow systemd and a project maintained by exactly one person pushing commits directly to the master branch with no PRs, protected branches, reviews, tests and fuzz targets whatsoever keep receiving approximately the same "global" score from scorecard (probably due to inconclusive checks).

Apart from bugs I've reported and fixed I think at least partly those results can be explained by the fact that if we're talking about SASTs for example scorecard can't tell the difference between projects taking this stuff seriously like systemd where at this point 3 different SASTs are used on a daily basis with graphs showing that those reports are really fixed and projects where one action is run hopefully weekly. I think that apart from frequency the number of SASTs should affect the score as well.

laurentsimon commented 3 years ago

I kind of complained about weird results scorecard produces in #1257 (comment)

Anyway, on an unrelated note, somehow systemd and a project maintained by exactly one person pushing commits directly to the master branch with no PRs, protected branches, reviews, tests and fuzz targets whatsoever keep receiving approximately the same "global" score from scorecard (probably due to inconclusive checks).

Apart from bugs I've reported and fixed I think at least partly those results can be explained by the fact that if we're talking about SASTs for example scorecard can't tell the difference between projects taking this stuff seriously like systemd where at this point 3 different SASTs are used on a daily basis with graphs showing that those reports are really fixed and projects where one action is run hopefully weekly. I think that apart from frequency the number of SASTs should affect the score as well.

you're right. It's a current limitation. We should check the number of non-fixed problems reported by the SAST. We d'like to also do Point 2 of https://github.com/ossf/scorecard/issues/966#issuecomment-915598041, but this is such a huge project...

laurentsimon commented 3 years ago

I'd add that I think projects analyzed more frequently should have higher score. Currently the SAST check would rate projects analyzed by CodeQL, say, once a week higher than projects where every PR is analyzed before getting merged because it's much more likely that frequent workflows fail due to different transient issues related to the availability of SASTs for example. I don't think that's fair.

I think scorecard should only look at the run on the last commit of the PR, in which case all your PRs would be "pass" and it should not penalize you. If that's not what we do, we should change it. I agree on giving more weight to SAST-on-PR vs SAST-as-cron or push request. We have not had the time to implement this yet. I'll create a PR for both.

Parsing all such files is challenging, and determining what they do is even harder, but their complete lack of presence could be considered considered concerning

Looks like it would just turn the SAST check into the CI check (where as far as I understand the presence of various CI services is checked for the most part)

laurentsimon commented 2 years ago

here is a linter action we should support https://github.com/github/super-linter Linters are not exactly SASTs, though. Maybe give a few points for having it

ossf / scorecard

SAST not detected in CII Best Practices badge workflows #1031