kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.55k stars 1.6k forks source link

[KF 1.0 Compliance] Vulnerability Scanning #3857

Open Bobgy opened 4 years ago

Bobgy commented 4 years ago

Part of https://github.com/kubeflow/pipelines/issues/2884

Docker images must be scanned for vulnerabilities and known vulnerabilities published

@jlewi Do you know how other images share vulnerability issues?

I did a quick investigation, gcr.io provides vulnerability scanning, but the result is not visible to external visitors even if the image is public.

We can export the generated yaml report with commands like

gcloud beta container images describe --show-package-vulnerability gcr.io/ml-pipeline/api-server:1.0.0-test-5

Documented in https://cloud.google.com/container-registry/docs/get-image-vulnerabilities

Do you think that's good enough?

Bobgy commented 4 years ago

@jbottum Do you have any ideas about this?

jlewi commented 4 years ago

kubeflow/kubeflow#3907 is tracking how we publish a list of vulnerabilities in our images.

A related issue is minimizing vulnerabilities e.g. by using distroless images. There is documentation at https://github.com/krishnadurai/community/blob/b1669588d785455a1e4e4cab456e03c08a05af7c/guidelines/creating_dockerfiles.md

Note the use of distroless images is recommended not a requirement.

kubeflow/kubeflow#4590 is a related issue about promoting the use of distroless in Kubeflow to minimize vulnerabilities.

To satisfy the vulnerability scanning requirement I think you just need to turn on vulnerability scanning in whatever GCR registry you are hosting your images in.

You might want to repurpose this issue or file a new one for reducing vulnerabilities if relevant.

Bobgy commented 4 years ago

@jlewi As reported in the kubeflow/kubeflow#3907, if we enable gcr vulnerability scanning, they are not visible for external viewers. So in addition to that we'd still need to dump a yaml report for each KFP release, sounds reasonable?

Bobgy commented 4 years ago

Thanks for the relevant link to reducing vulnerability. I'll create a separate issue about it.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Bobgy commented 4 years ago

/lifecycle frozen

Bobgy commented 3 years ago

An example of fixing some vulnerability issues: https://github.com/kubeflow/pipelines/issues/4531

some related readings:

My take aways:

going forward, we should:

Bobgy commented 3 years ago

AIs:

Bobgy commented 3 years ago

Requests to reduce vulnerabilities come more often than before, so I'm taking some time to continue this.

Bobgy commented 3 years ago

Formalize a vulnerability management process

I think the process should come with two parts:

  1. Set up a process to update dependencies/base images more frequently. This is already being addressed in https://github.com/kubeflow/pipelines/issues/4682

  2. Add an automated vulnerability policy check step in our CI/CD pipelines. In the pipeline, we'll unavoidably need to allowlist many CVEs (maybe even of high/critical level), because a fix may not have been released, or the CVE may not be exploitable in KFP use-case, or maybe risk is tolerable. We should add comment on this whitelist about the reasons, and mark some of them as TODOs.

I'll focus on 2. in this issue.

Bobgy commented 3 years ago

Research of tools suitable for this need:

Using them combined seem to meet our basic needs.

Bobgy commented 3 years ago

There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.

Bobgy commented 3 years ago

A bit more research lead me to https://github.com/aquasecurity/trivy. It seems the leading open source option. There are some extra nice features:

  1. a local CLI for exploration -- it can group CVEs by library type:

    $ trivy image knqyf263/vuln-image:1.2.3
    2019-05-16T12:59:03.150+0900    INFO    Detecting Alpine vulnerabilities...
    2019-05-16T12:59:04.941+0900    INFO    Detecting bundler vulnerabilities...
    2019-05-16T12:59:05.967+0900    INFO    Detecting cargo vulnerabilities...
    2019-05-16T12:59:07.834+0900    INFO    Detecting composer vulnerabilities...
    2019-05-16T12:59:10.285+0900    INFO    Detecting npm vulnerabilities...
    2019-05-16T12:59:11.487+0900    INFO    Detecting pipenv vulnerabilities...
    
    knqyf263/vuln-image:1.2.3 (alpine 3.7.1)
    ========================================
    Total: 26 (UNKNOWN: 0, LOW: 3, MEDIUM: 16, HIGH: 5, CRITICAL: 2)
    
    +---------+------------------+----------+-------------------+---------------+----------------------------------+
    | LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION |              TITLE               |
    +---------+------------------+----------+-------------------+---------------+----------------------------------+
    | curl    | CVE-2018-14618   | CRITICAL | 7.61.0-r0         | 7.61.1-r0     | curl: NTLM password overflow     |
    |         |                  |          |                   |               | via integer overflow             |
    +         +------------------+----------+                   +---------------+----------------------------------+
    |         | CVE-2018-16839   | HIGH     |                   | 7.61.1-r1     | curl: Integer overflow leading   |
    |         |                  |          |                   |               | to heap-based buffer overflow in |
    |         |                  |          |                   |               | Curl_sasl_create_plain_message() |
    +         +------------------+          +                   +---------------+----------------------------------+
    |         | CVE-2019-3822    |          |                   | 7.61.1-r2     | curl: NTLMv2 type-3 header       |
    |         |                  |          |                   |               | stack buffer overflow            |
    +         +------------------+          +                   +---------------+----------------------------------+
    |         | CVE-2018-16840   |          |                   | 7.61.1-r1     | curl: Use-after-free when        |
    |         |                  |          |                   |               | closing "easy" handle in         |
    |         |                  |          |                   |               | Curl_close()                     |
    +         +------------------+----------+                   +               +----------------------------------+
    |         | CVE-2018-16842   | MEDIUM   |                   |               | curl: Heap-based buffer          |
    |         |                  |          |                   |               | over-read in the curl tool       |
    |         |                  |          |                   |               | warning formatting               |
    +         +------------------+          +                   +---------------+----------------------------------+
    |         | CVE-2018-16890   |          |                   | 7.61.1-r2     | curl: NTLM type-2 heap           |
    |         |                  |          |                   |               | out-of-bounds buffer read        |
    +         +------------------+          +                   +               +----------------------------------+
    |         | CVE-2019-3823    |          |                   |               | curl: SMTP end-of-response       |
    |         |                  |          |                   |               | out-of-bounds read               |
    +---------+------------------+----------+-------------------+---------------+----------------------------------+
    | git     | CVE-2018-17456   | HIGH     | 2.15.2-r0         | 2.15.3-r0     | git: arbitrary code execution    |
    |         |                  |          |                   |               | via .gitmodules                  |
    +         +------------------+          +                   +               +----------------------------------+
    |         | CVE-2018-19486   |          |                   |               | git: Improper handling of        |
    |         |                  |          |                   |               | PATH allows for commands to be   |
    |         |                  |          |                   |               | executed from...                 |
    +---------+------------------+----------+-------------------+---------------+----------------------------------+
    ...
  2. there are existing github actions that use trivy: https://github.com/Azure/container-scan
Bobgy commented 3 years ago

For reference, vulnerability vector description: https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator

Bobgy commented 3 years ago

An experimental feature of trivy is to use user defined open agent policy as checker for the vulnerabilities. It can be used to filter based on vulnerability vector, examples include:

So it can reduce the amount of vulnerabilities we need to check based on our specific environment requirements.

References:

Bobgy commented 3 years ago

EDIT: what's described below doesn't work well, because the result of gcloud beta container images describe --show-package-vulnerability gcr.io/ml-pipeline/api-server:1.0.0-test-5 --format=json does not provide information on vulnerability vector.

Open Policy Agent is in fact a generic tool:

inputs: "JSON" and "Policy" output: "pass?"

So we could just use it with gcr vulnerability scanning to get the best of both flexibility using a GCP managed service.

==

or alternatively we can just write a script to check the vulnerability JSON as our own policy.

Bobgy commented 3 years ago

Analysis of Options

Trivy

Kritis

Other options look obviously worse than the two, so I'm leaving them out.

To note that, OPA looks like it has some learning curve because there's a new language to learn, so I'd prefer we stay away from it initially. Therefore, if not using OPA, Trivy's major advantage does not apply to us.

I think we can start with Kritis, if it proves to work as it is, we can delay further customization when we really need to. If we discover blocking bugs, we can revisit Trivy as a backup plan.

shawnzhu commented 3 years ago

I'm interested in this issue. speaking of trivy, it supports filtering vulnerabilities by a number of options besides OPA:

  1. --severity - https://github.com/aquasecurity/trivy#filter-the-vulnerabilities-by-severities
  2. .trivyignore (ignore spedific vulnerabilities) - https://github.com/aquasecurity/trivy#ignore-the-specified-vulnerabilities
  3. --skip-files - https://github.com/aquasecurity/trivy#skip-traversal-of-the-specific-files
  4. --skip-dirs - https://github.com/aquasecurity/trivy#skip-traversal-in-the-specific-directory

the lack of activity of Kritis might be a problem, but willing to give it a try since I haven't use it before.

Bobgy commented 3 years ago

@shawnzhu You are right.

I didn't make it clear that my major preference for kritis is -- it uses GCP container scanning as data source (in fact, it directly reads GCP container scanning results, so you cannot use it outside GCP)

Bobgy commented 3 years ago

Some notes after experimenting with Kritis:

Bobgy commented 3 years ago

I built a KFP pipeline that runs Kritis: https://github.com/kubeflow/pipelines/pull/5066. This is now a one off pipeline I use to verify existing released images.

P1 The next steps would be maintaining a long running KFP test cluster and run that pipeline as one of the post submit tests.

davidspek commented 3 years ago

There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.

@Bobgy I think this is a better link: https://github.com/quay/clair. Clair is what Amazon ECR uses: https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html.