quay / claircore

foundation modules for scanning container packages and reporting vulnerabilities
https://quay.github.io/claircore/
Apache License 2.0
144 stars 85 forks source link

Should vulnerabilities be de-duped from multiple repositories? #1216

Open paulaldridge opened 7 months ago

paulaldridge commented 7 months ago

We've found that for Redhat images a vuln report will show duplicates of all vulnerabilities, due to matching against multiple repositories. Unsure if this is intended behaviour and useful to show both repositories, but it seems negative to bloat the vuln report.

For example, using a test image for ubi8.8 (i.e. FROM registry.access.redhat.com/ubi8:8.8-1067.1698056881), the index report shows multiple repositories:

  "repository": {
    "3": {
      "id": "3",
      "name": "cpe:/o:redhat:rhel:8.3::baseos",
      "key": "rhel-cpe-repository",
      "cpe": "cpe:2.3:o:redhat:rhel:8.3:*:baseos:*:*:*:*:*"
    },
    "4": {
      "id": "4",
      "name": "cpe:/a:redhat:enterprise_linux:8::appstream",
      "key": "rhel-cpe-repository",
      "cpe": "cpe:2.3:a:redhat:enterprise_linux:8:*:appstream:*:*:*:*:*"
    },
    "5": {
      "id": "5",
      "name": "cpe:/o:redhat:enterprise_linux:8::baseos",
      "key": "rhel-cpe-repository",
      "cpe": "cpe:2.3:o:redhat:enterprise_linux:8:*:baseos:*:*:*:*:*"
    },
    "598": {
      "id": "598",
      "name": "cpe:/a:redhat:rhel:8.3::appstream",
      "key": "rhel-cpe-repository",
      "cpe": "cpe:2.3:a:redhat:rhel:8.3:*:appstream:*:*:*:*:*"
    },
    "6829": {
      "id": "6829",
      "name": "Red Hat Container Catalog",
      "uri": "https://catalog.redhat.com/software/containers/explore"
    }
  },

And the vuln report contains duplicate vulnerabilities with the only difference being the repository, e.g. Screenshot 2024-01-17 at 13 43 46

Full vuln report: ubi8.8VulnReport.json

For reference we are using: github.com/quay/clair/config v1.3.0 github.com/quay/clair/v4 v4.7.2 github.com/quay/claircore v1.5.19

hdonnay commented 7 months ago

This is just something that falls out of the logic of the data that Red Hat's build system and vulnerability information provide. There's no way to know which is the correct repository, so claircore's rhel indexing logic is forced to use a cross-product in some situations. See PROJQUAY-5185 and PROJQUAY-5214 for discussion on why.

paulaldridge commented 7 months ago

Ah I see, thanks for explaining. What do you think to de-duping the output when we know it's the same and having the repositories listed together, under 1 vulnerability? Not sure if it would be too heavy weight to check that the outputs are the same before being sure to combine though (unless we know they always will be in these cases).

hdonnay commented 7 months ago

Yeah, that would be a way to "clean up" the presentation. I don't think it's feasible to do in claircore with the current architecture that doesn't have a real reference/identity mechanism for vulnerabilities.