Add support for SARIF output

Dentrax commented 1 year ago

It'd be great to have a SARIF output format to upload the results to GitHub. (i.e. with github/codeql-action/upload-sarif action)

Blocked by #217

marcinguy commented 1 year ago

@Dentrax You can get SARIF output and scanning GH action using betterscan https://github.com/marcinguy/betterscan-ce

It has OSV scanner as one the scanners

Dentrax commented 1 year ago

Hey @marcinguy, since this is slightly out-of-context of what I'm proposing here; but I definitely check that project, seems interesting! Thanks for dropping that.

marcinguy commented 1 year ago

@Dentrax No worries. Was trying to help. Yea, different outputs from OSV scanner would be great. +1

oliverchang commented 1 year ago

Thanks for the suggestion! This is indeed something we've been thinking about. This is something that may be blocked on #150.

Can you also speak a bit about how you intend to consume the SARIF output? Is this meant to be fed to e.g. GitHub's code scanning alerts? Are there other use cases for this?

Dentrax commented 1 year ago

Is this meant to be fed to e.g. GitHub's code scanning alerts?

Actually, yes. This is my only use-case.

jaskaransinghdr6j commented 1 year ago

Hey guys, any update on SARIF support?

pwa-tapptic commented 1 year ago

I'm also interested in this feature. It makes integrations with CI/CD tools a way nicer!

oliverchang commented 1 year ago

This is already being worked on by @another-rex in https://github.com/google/osv-scanner/pull/420.

Ma1tobiose commented 1 year ago

I tested this and it outputs to sarif format great, but probably not quite what I expected. I'm going through osv which will check for say 10 issues, and I'm expecting all 10 to be expanded and displayed (in order to be put into results), but at the moment it looks like it's just writing a result and printing a table in the message. This should not be the desired result for this requirement.

oliverchang commented 1 year ago

I tested this and it outputs to sarif format great, but probably not quite what I expected. I'm going through osv which will check for say 10 issues, and I'm expecting all 10 to be expanded and displayed (in order to be put into results), but at the moment it looks like it's just writing a result and printing a table in the message. This should not be the desired result for this requirement.

Thanks for the feedback @Ma1tobiose. Can you please give a sample output that you'd like to see?

Also curious if there's feedback from any others in this issue. e.g. @pwa-tapptic , @jaskaransinghdr6j @Dentrax

pwa-tapptic commented 1 year ago

I tested it on my side and I'd have same remark as @Ma1tobiose . https://github.com/anchore/grype creates SCA results in SARIF format quite nicely (syft . -o json | grype -q -o sarif). As @Ma1tobiose said, OSV compresses everything into single finding in SARIF, but findings should be rather separate objects in results array:

{
  "version": "2.1.0",
  "$schema": "https://json.schemastore.org/sarif-2.1.0-rtm.5.json",
  "runs": [
    {
      "tool": { ...
      },
      "results": [
        {
          "ruleId": "GHSA-72xf-g2v4-qvf3-tough-cookie",
          "message": {
            "text": "The path /yarn.lock reports tough-cookie at version 4.1.2  which would result in a vulnerable (npm) package installed"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "/yarn.lock"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              }
            }
          ]
        },
        {
          "ruleId": "GHSA-776f-qx25-q3cc-xml2js",
          "message": {
            "text": "The path /yarn.lock reports xml2js at version 0.4.23  which would result in a vulnerable (npm) package installed"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "/yarn.lock"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              }
            }
          ]
        },
...

just to give you an example.

oliverchang commented 1 year ago

Thanks!

Our motivation behind a single aggregated report, is to better connect this to #352 via a single remediation command included in the result (e.g. "to fix these interactively, run osv-scanner fix ..."), where we provide a tool that can help with interactively (or automatically) addressing multiple vulnerabilities at once in the same lockfile/manifest file.

Could we understand a bit better how having separate results helps your use cases? Does this provide better UX for you to look at vulnerability results individually as opposed to a single one?

Ma1tobiose commented 1 year ago

Thanks! @oliverchang I can understand that osv will store the data structure in an aggregated form in order to be able to fix it in one click. But if the output is sarif or json, my side hope is to access the CICD process, osv as one of the tools in the output standardized report, so that it is easy to do the aggregated display with the results of other tools. So I would need a report in a format similar to the one provided by @pwa-tapptic (results in a separate form).

pwa-tapptic commented 1 year ago

@oliverchang one-click fix is an awesome feature and great motivation behind tool as OSV, it must be seen as something valuable from CLI user perspective. But SARIF (i.e. Static Analysis Results Interchange Format) has slightly different motivations behind (details here), as the name of the format itself suggests. CI/CD tools are one of potential beneficiaries of the standardized interchange format.

oliverchang commented 1 year ago

@another-rex let's investigate if there's a good middle ground here.

jaskaransinghdr6j commented 1 year ago

+1 SARIF should ideally represent each "finding" as it's own line item. Being an interchange format, it would be right to use it as a 1 CVE per item template.

another-rex commented 1 year ago

Here is the current format I am thinking of merging, would love any feedback (fyi: @pwa-tapptic @jaskaransinghdr6j @Ma1tobiose)

{
  "version": "2.1.0",
  "$schema": "https://json.schemastore.org/sarif-2.1.0.json",
  "runs": [
    {
      "tool": {
        "driver": {
          "informationUri": "https://github.com/google/osv-scanner",
          "name": "osv-scanner",
          "rules": [
            {
              "id": "RUSTSEC-2022-0041",
              "shortDescription": {
                "text": "OSV.Summary"
              },
              "fullDescription": {
                "text": "OSV.Details",
                "markdown": "OSV.Details"
              },
              "help": {
                "text": "...",
                "markdown": "Markdown table of occurrences in the repo (i.e. If this vuln affects multiple packages/lockfiles)"
              }
            }
          ]
        }
      },
      "artifacts": [
        {
          "location": {
            "uri": "path/to/Cargo.lock"
          },
          "length": -1
        }
      ],
      "results": [
        {
          "ruleId": "RUSTSEC-2023-0045",
          "ruleIndex": 0,
          "level": "warning",
          "message": {
            "text": "Package 'memoffset@0.6.1' is vulnerable to 'RUSTSEC-2023-0045', please upgrade to versions '0.6.2' to fix this vulnerability"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "path/to/Cargo.lock"
                }
              }
            }
          ]
        }
        // ... One result per vulnerable package per vulnerability
      ]
    }
  ]
}

+1 SARIF should ideally represent each "finding" as it's own line item. Being an interchange format, it would be right to use it as a 1 CVE per item template.

This is complicated by the fact that not every OSV entry has a corresponding CVE (e.g. RUSTSEC-2023-0045), and we can have multiple advisories from different sources referring to the exact same vulnerability.

This is common for vulnerabilities published by github advisories, where for some languages there will be security advisories published by the language maintainers as well.

One solution to this we are thinking is to fill "ruleId" with all the aliased vulnerabilities joined together (e.g. id: GHSA-wfg4-322g-9vqv,RUSTSEC-2023-0045). SARIF provides a 3.49.4 deprecatedIds property which is an array of unique IDs, in which we can separate those IDs out into separate array entries.

Tell us if this will work for your workflows!

jaskaransinghdr6j commented 1 year ago

This is great! However, I feel as many findings aggregators will be using these findings to autofill fields like "Finding Title", the joined ruleId might be a bit too onerous.

There seems to be no straightforward way to deal with this according to the SARIF spec. What if we default the ruleId to "CVE-XXXX" if available, and fallback to package specific id "RUSTSEC-YYYY" if the former is not available? (or vice-versa)

The other approach could be to list all the aliases in the message text field as : "Also Reported as RUSTSEC-YYYY and GHSA-ZZZZ"

Ma1tobiose commented 1 year ago

The formatting is great, and I agree with @jaskaransinghdr6j's solution (if it's feasible), being able to easily query the cve and refer sources would make it much quicker to find vulnerability details and fixes.

another-rex commented 12 months ago

There seems to be no straightforward way to deal with this according to the SARIF spec. What if we default the ruleId to "CVE-XXXX" if available, and fallback to package specific id "RUSTSEC-YYYY" if the former is not available? (or vice-versa)

I implemented this in the SARIF PR and it's now been merged, so please try it out in latest main branch of osv-scanner (this change is not in 1.4.0) and see if this fits your use cases!

google / osv-scanner

Add support for SARIF output #216