project-copacetic / copacetic

🧵 CLI tool for directly patching container images!
https://project-copacetic.github.io/copacetic/
Apache License 2.0
960 stars 63 forks source link

[REQ] support for sarif as input #32

Open sozercan opened 1 year ago

sozercan commented 1 year ago

What kind of request is this?

New feature

What is your request or suggestion?

Today, copa supports trivy json as the input. We should also support sarif as input for generic results.

Scanners like trivy and grype supports exporting as sarif. https://aquasecurity.github.io/trivy/v0.37/docs/vulnerability/examples/report/#sarif

reetasingh commented 1 year ago

Hi @salaxander @sozercan is this being actively worked on ? or can I work on it?

sozercan commented 1 year ago

@reetasingh sure. let's start with a design doc on how this would work especially if it can work across multiple scanners. feel free to use google docs or whatever you can publicly share and we can review the design before any implementation

reetasingh commented 1 year ago

Hi @sozercan sure, will get started with design doc

reetasingh commented 1 year ago

Hi @sozercan sorry for late update on this. I have been actively investigating this issue. I noticed that the sarif output for trivy does not have information about OS Family, OS Version and OS Architecture https://gist.github.com/reetasingh/85f890a47571dd3a908124c292c5e001

Also verified in the code that information is not being passed https://github.com/aquasecurity/trivy/blob/main/pkg/report/sarif.go

If I understand it correctly, this information would be needed to decide the package manager and hence its a blocker to implement this feature. Let me know if my understanding is correct

reetasingh commented 1 year ago

/assign

sozercan commented 1 year ago

@reetasingh thanks for the investigation! I think we would need that information and package names/versions so this might be a blocker for adapting sarif as input in the short term.

reetasingh commented 1 year ago

Thanks @sozercan , here is the high level design doc https://docs.google.com/document/d/1ZtuJDxDY_DrWTtQNE8ME0-WkkcQUh9x-IM9LSxoBnsw/edit#heading=h.lgtokumb4lep I have mentioned absence of information about OS Family, OS Version and OS Architecture as a blocker

Let me know your opinions on this doc

sozercan commented 1 year ago

thanks @reetasingh for the investigation and the doc! 🙏

Copying the blocker from your doc here:

The biggest blocker is that the Sarif output report generated by Trivy tool does not have information about OS Family, OS Version and OS Architecture https://gist.github.com/reetasingh/85f890a47571dd3a908124c292c5e001
Also verified in the code that information is not being passed https://github.com/aquasecurity/trivy/blob/main/pkg/report/sarif.go

This information is a must to have for later processing in the copacetic tool which needs this information to decide the package manager to use to patch the image. 

We can create an issue on the [Trivy github repo](https://github.com/aquasecurity/trivy) and if required implement PR to add this info. 
This can be a little tricky because the Sarif Report format does not have a dedicated field for storing info about OS Family, OS Version and OS Architecture  so this needs to be shoved in one of the help text fields.
craigbox commented 1 year ago

What about the SARIF output from Grype? Could that be made to work today?

sozercan commented 1 year ago

@craigbox same applies to grype too unfortunately. example: https://snips.sh/f/2C3_JBXLeC?r=1

ritazh commented 1 year ago

@toddysm FYI

michaelcfanning commented 1 year ago

Hello! I'm co-editor of the SARIF standard and someone brought this thread to my attention. I just need a bit more context, if someone can provide it. If I think about OS details, these could be relevant to many things covered in SARIF including: 1) the machine environment for a tool invocation, 2) a distributed agent environment where a scan occcurred, 3) metadata for a literal scan target, e.g., a VM that's getting analyzed, 4), metadata for rules, e.g., a set of OS version that are relevant for a specific check).

My background is static analysis at the binary/code level, so your additional context will be helpful. I'll go do some due diligence around trivy and your project to help get oriented. I wanted to reply here immediate so you know I'm trying to jump in.

SARIF has some existing properties, some of which could be relevant/useful. If there are gaps, SARIF has clear mechanisms for packaging arbitrary supporting data, as property bags attached to other data. If you are an aggregating system, you could help define conventions for carrying/storing this missing data across multiple tools that produce conceptually similarly things. Finally, the SARIF 2.2 design effort just kicked off; moving forward we can request non-breaking changes to the format, if helpful, that address your needs. Here's where those requests would get filed: https://github.com/oasis-tcs/sarif-spec/

craigbox commented 1 year ago

@anubhav06 is working on Kubescape for the LFX Mentorship program, and I've asked him to look at integrating Kubescape as a source of vulnerability data for copa.

Kubescape uses Grype as its engine, but we can mark the data up as required to make this possible. I've recommended that he attempt to use SARIF for the interchange format. We'll open other issues as required to discuss the integration but I wanted to flag this here.

ritazh commented 1 year ago

@camaleon2016 FYI

anubhav06 commented 1 year ago

Hey @sozercan Let's say that we are able to add the missing OS information somewhere in the SARIF output for the different scanners (say, in the help text field). So according to the design document which @reetasingh has proposed, copa currently has two report parsers (trivy and qualys) and we can expand these individual parsers to support multiple input formats (SARIF, in this case).

However, won't this mean that for every new scanner apart from what copa currently supports (like Grype, Kubescape, etc.), we will have to write a new parser file for it? In #59 you say that:

The intention is for others to contribute without each scanner being part of the core copa.

But, here in the case of SARIF support as well, all the other scanners will still be part of the core copa, no?

Just trying to understand this clearly. Please correct me if I'm wrong.

anubhav06 commented 1 year ago

I tried working on modifying grype's SARIF output format to accommodate the missing OS Name, Version and architecture, in the help text field. This is how it looks now: https://gist.github.com/anubhav06/6fc44f31dc026717cfce1836fc022294 What do you all think?

sozercan commented 1 year ago

@anubhav06 in the ideal world, copa would just parse SARIF files for vuln results without the need for logic for each individual scanners.

However, as you and @reetasingh pointed out, SARIF is missing the os information. In addition to this, each scanner's SARIF outputs are different in terms of parsing package information needed for copa (package name, current version, fixed version). This means that even if we align on SARIF, copa would have to parse each scanner’s output seperately even though they are all SARIF.

here are 3 examples of CVE-2022-32221 of nginx:1.22.0 image scan results in SARIF:

Grype: https://snips.sh/f/mUtsOWhdbz

{
          "ruleId": "CVE-2022-32221-libcurl4",
          "message": {
            "text": "The path /usr/share/doc/libcurl4/copyright reports libcurl4 at version 7.74.0-1.3+deb11u3  which is a vulnerable (deb) package installed in the container"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "image//usr/share/doc/libcurl4/copyright"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              },
              "logicalLocations": [
                {
                  "name": "/usr/share/doc/libcurl4/copyright",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/usr/share/doc/libcurl4/copyright"
                },
                {
                  "name": "/var/lib/dpkg/info/libcurl4:amd64.md5sums",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/var/lib/dpkg/info/libcurl4:amd64.md5sums"
                },
                {
                  "name": "/var/lib/dpkg/status",
                  "fullyQualifiedName": "nginx:1.22.0@sha256:b9361c275c5d9b3d5a8c6cff59733193fec1be8cb57374d880e8dd8fd503ed8b:/var/lib/dpkg/status"
                }
              ]
            }
          ]
        },

Snyk: https://snips.sh/f/nvdkbzGTz6

Snyk doesn’t have CVE-2022-32221 in results for some reason, so this is from rules

{
              "id": "SNYK-DEBIAN11-CURL-3065656",
              "shortDescription": {
                "text": "Critical severity - Exposure of Resource to Wrong Sphere vulnerability in curl"
              },
              "fullDescription": {
                "text": "(CVE-2022-32221) curl/libcurl4@7.74.0-1.3+deb11u3"
              },
              "help": {
                "text": "",
                "markdown": "## NVD Description\n**_Note:_** _Versions mentioned in the description apply only to the upstream `curl` package and not the `curl` package as distributed by `Debian:11`._\n_See `How to fix?` for `Debian:11` relevant fixed versions and status._\n\nWhen doing HTTP(S) transfers, libcurl might erroneously use the read callback (`CURLOPT_READFUNCTION`) to ask for data to send, even when the `CURLOPT_POSTFIELDS` option has been set, if the same handle previously was used to issue a `PUT` request which used that callback. This flaw may surprise the application and cause it to misbehave and either send off the wrong data or use memory after free or similar in the subsequent `POST` request. The problem exists in the logic for a reused handle when it is changed from a PUT to a POST.\n## Remediation\nUpgrade `Debian:11` `curl` to version 7.74.0-1.3+deb11u5 or higher.\n## References\n- [ADVISORY](https://security-tracker.debian.org/tracker/CVE-2022-32221)\n- [support@hackerone.com](https://hackerone.com/reports/1704017)\n- [support@hackerone.com](https://security.gentoo.org/glsa/202212-01)\n- [support@hackerone.com](https://security.netapp.com/advisory/ntap-20230110-0006/)\n- [support@hackerone.com](https://support.apple.com/kb/HT213604)\n- [support@hackerone.com](https://support.apple.com/kb/HT213605)\n- [support@hackerone.com](http://seclists.org/fulldisclosure/2023/Jan/20)\n- [support@hackerone.com](http://seclists.org/fulldisclosure/2023/Jan/19)\n- [support@hackerone.com](https://www.debian.org/security/2023/dsa-5330)\n- [support@hackerone.com](https://lists.debian.org/debian-lts-announce/2023/01/msg00028.html)\n- [support@hackerone.com](https://security.netapp.com/advisory/ntap-20230208-0002/)\n- [support@hackerone.com](http://www.openwall.com/lists/oss-security/2023/05/17/4)\n"
              },
              "defaultConfiguration": {
                "level": "error"
              },
              "properties": {
                "tags": [
                  "security",
                  "CWE-668",
                  "deb"
                ]
              }
            },

Trivy: https://snips.sh/f/fNq01xYjhm

{
          "ruleId": "CVE-2022-32221",
          "ruleIndex": 5,
          "level": "error",
          "message": {
            "text": "Package: libcurl4\nInstalled Version: 7.74.0-1.3+deb11u3\nVulnerability CVE-2022-32221\nSeverity: CRITICAL\nFixed Version: 7.74.0-1.3+deb11u5\nLink: [CVE-2022-32221](https://avd.aquasec.com/nvd/cve-2022-32221)"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "library/nginx",
                  "uriBaseId": "ROOTPATH"
                },
                "region": {
                  "startLine": 1,
                  "startColumn": 1,
                  "endLine": 1,
                  "endColumn": 1
                }
              },
              "message": {
                "text": "library/nginx: libcurl4@7.74.0-1.3+deb11u3"
              }
            }
          ]
        },

Because of these limitations of SARIF, we need individual scanners to provide the results in an accurate way so copa can parse. However, we would like it so that anyone can integrate their own scanner without them being part of core copa. This is where modular scanners (#59) comes in so each scanner can be out of tree. @anubhav06 let me know if this answers your questions.

sozercan commented 1 year ago

Just for comparison, this is what Trivy's own JSON output provides today: https://snips.sh/f/BSl2OG1udf

...
  "Metadata": {
    "OS": {
      "Family": "debian",
    },
...
  "Results": [
    {
          "PkgName": "curl",
          "InstalledVersion": "7.74.0-1.3+deb11u3",
          "FixedVersion": "7.74.0-1.3+deb11u5",
          ...
anubhav06 commented 1 year ago

Yes, this answered my question. So basically since each scanner produces a different SARIF output, and we don't want each new scanner to be part of core copa , we'll be going with the modular scanner approach. There, the different scanners can be made to be compatible with copa, without actually being part of core copa.

Thanks for the clarification :)