Open Bobgy opened 4 years ago
@jbottum Do you have any ideas about this?
kubeflow/kubeflow#3907 is tracking how we publish a list of vulnerabilities in our images.
A related issue is minimizing vulnerabilities e.g. by using distroless images. There is documentation at https://github.com/krishnadurai/community/blob/b1669588d785455a1e4e4cab456e03c08a05af7c/guidelines/creating_dockerfiles.md
Note the use of distroless images is recommended not a requirement.
kubeflow/kubeflow#4590 is a related issue about promoting the use of distroless in Kubeflow to minimize vulnerabilities.
To satisfy the vulnerability scanning requirement I think you just need to turn on vulnerability scanning in whatever GCR registry you are hosting your images in.
You might want to repurpose this issue or file a new one for reducing vulnerabilities if relevant.
@jlewi As reported in the kubeflow/kubeflow#3907, if we enable gcr vulnerability scanning, they are not visible for external viewers. So in addition to that we'd still need to dump a yaml report for each KFP release, sounds reasonable?
Thanks for the relevant link to reducing vulnerability. I'll create a separate issue about it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
An example of fixing some vulnerability issues: https://github.com/kubeflow/pipelines/issues/4531
some related readings:
My take aways:
going forward, we should:
AIs:
Requests to reduce vulnerabilities come more often than before, so I'm taking some time to continue this.
I think the process should come with two parts:
Set up a process to update dependencies/base images more frequently. This is already being addressed in https://github.com/kubeflow/pipelines/issues/4682
Add an automated vulnerability policy check step in our CI/CD pipelines. In the pipeline, we'll unavoidably need to allowlist many CVEs (maybe even of high/critical level), because a fix may not have been released, or the CVE may not be exploitable in KFP use-case, or maybe risk is tolerable. We should add comment on this whitelist about the reasons, and mark some of them as TODOs.
I'll focus on 2. in this issue.
Research of tools suitable for this need:
apiVersion: kritis.grafeas.io/v1beta1
kind: VulnzSigningPolicy
metadata:
name: my-vsp
spec:
imageVulnerabilityRequirements:
maximumFixableSeverity: MEDIUM
maximumUnfixableSeverity: MEDIUM
allowlistCVEs:
- projects/goog-vulnz/notes/CVE-2020-10543
- projects/goog-vulnz/notes/CVE-2020-10878
- projects/goog-vulnz/notes/CVE-2020-14155
Using them combined seem to meet our basic needs.
There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.
A bit more research lead me to https://github.com/aquasecurity/trivy. It seems the leading open source option. There are some extra nice features:
a local CLI for exploration -- it can group CVEs by library type:
$ trivy image knqyf263/vuln-image:1.2.3
2019-05-16T12:59:03.150+0900 INFO Detecting Alpine vulnerabilities...
2019-05-16T12:59:04.941+0900 INFO Detecting bundler vulnerabilities...
2019-05-16T12:59:05.967+0900 INFO Detecting cargo vulnerabilities...
2019-05-16T12:59:07.834+0900 INFO Detecting composer vulnerabilities...
2019-05-16T12:59:10.285+0900 INFO Detecting npm vulnerabilities...
2019-05-16T12:59:11.487+0900 INFO Detecting pipenv vulnerabilities...
knqyf263/vuln-image:1.2.3 (alpine 3.7.1)
========================================
Total: 26 (UNKNOWN: 0, LOW: 3, MEDIUM: 16, HIGH: 5, CRITICAL: 2)
+---------+------------------+----------+-------------------+---------------+----------------------------------+
| LIBRARY | VULNERABILITY ID | SEVERITY | INSTALLED VERSION | FIXED VERSION | TITLE |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
| curl | CVE-2018-14618 | CRITICAL | 7.61.0-r0 | 7.61.1-r0 | curl: NTLM password overflow |
| | | | | | via integer overflow |
+ +------------------+----------+ +---------------+----------------------------------+
| | CVE-2018-16839 | HIGH | | 7.61.1-r1 | curl: Integer overflow leading |
| | | | | | to heap-based buffer overflow in |
| | | | | | Curl_sasl_create_plain_message() |
+ +------------------+ + +---------------+----------------------------------+
| | CVE-2019-3822 | | | 7.61.1-r2 | curl: NTLMv2 type-3 header |
| | | | | | stack buffer overflow |
+ +------------------+ + +---------------+----------------------------------+
| | CVE-2018-16840 | | | 7.61.1-r1 | curl: Use-after-free when |
| | | | | | closing "easy" handle in |
| | | | | | Curl_close() |
+ +------------------+----------+ + +----------------------------------+
| | CVE-2018-16842 | MEDIUM | | | curl: Heap-based buffer |
| | | | | | over-read in the curl tool |
| | | | | | warning formatting |
+ +------------------+ + +---------------+----------------------------------+
| | CVE-2018-16890 | | | 7.61.1-r2 | curl: NTLM type-2 heap |
| | | | | | out-of-bounds buffer read |
+ +------------------+ + + +----------------------------------+
| | CVE-2019-3823 | | | | curl: SMTP end-of-response |
| | | | | | out-of-bounds read |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
| git | CVE-2018-17456 | HIGH | 2.15.2-r0 | 2.15.3-r0 | git: arbitrary code execution |
| | | | | | via .gitmodules |
+ +------------------+ + + +----------------------------------+
| | CVE-2018-19486 | | | | git: Improper handling of |
| | | | | | PATH allows for commands to be |
| | | | | | executed from... |
+---------+------------------+----------+-------------------+---------------+----------------------------------+
...
For reference, vulnerability vector description: https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator
An experimental feature of trivy is to use user defined open agent policy as checker for the vulnerabilities. It can be used to filter based on vulnerability vector, examples include:
So it can reduce the amount of vulnerabilities we need to check based on our specific environment requirements.
References:
EDIT: what's described below doesn't work well, because the result of gcloud beta container images describe --show-package-vulnerability gcr.io/ml-pipeline/api-server:1.0.0-test-5 --format=json
does not provide information on vulnerability vector.
Open Policy Agent is in fact a generic tool:
inputs: "JSON" and "Policy" output: "pass?"
So we could just use it with gcr vulnerability scanning to get the best of both flexibility using a GCP managed service.
==
or alternatively we can just write a script to check the vulnerability JSON as our own policy.
Trivy
Kritis
Other options look obviously worse than the two, so I'm leaving them out.
To note that, OPA looks like it has some learning curve because there's a new language to learn, so I'd prefer we stay away from it initially. Therefore, if not using OPA, Trivy's major advantage does not apply to us.
I think we can start with Kritis, if it proves to work as it is, we can delay further customization when we really need to. If we discover blocking bugs, we can revisit Trivy as a backup plan.
I'm interested in this issue. speaking of trivy, it supports filtering vulnerabilities by a number of options besides OPA:
--severity
- https://github.com/aquasecurity/trivy#filter-the-vulnerabilities-by-severities.trivyignore
(ignore spedific vulnerabilities) - https://github.com/aquasecurity/trivy#ignore-the-specified-vulnerabilities--skip-files
- https://github.com/aquasecurity/trivy#skip-traversal-of-the-specific-files--skip-dirs
- https://github.com/aquasecurity/trivy#skip-traversal-in-the-specific-directorythe lack of activity of Kritis might be a problem, but willing to give it a try since I haven't use it before.
@shawnzhu You are right.
I didn't make it clear that my major preference for kritis is -- it uses GCP container scanning as data source (in fact, it directly reads GCP container scanning results, so you cannot use it outside GCP)
Some notes after experimenting with Kritis:
E0201 01:43:02.099893 1 main.go:211] found fixable CVE \<redacted> in gcr.io/\<redacted>, which has severity HIGH exceeding max fixable severity MEDIUM
I built a KFP pipeline that runs Kritis: https://github.com/kubeflow/pipelines/pull/5066. This is now a one off pipeline I use to verify existing released images.
P1 The next steps would be maintaining a long running KFP test cluster and run that pipeline as one of the post submit tests.
There seems to be similar open source tools like https://github.com/arminc/clair-scanner, but it requires running your own vulnerability server. It's more convenient to use GCP container analysis service directly.
@Bobgy I think this is a better link: https://github.com/quay/clair. Clair is what Amazon ECR uses: https://docs.aws.amazon.com/AmazonECR/latest/userguide/image-scanning.html.
Part of https://github.com/kubeflow/pipelines/issues/2884
@jlewi Do you know how other images share vulnerability issues?
I did a quick investigation, gcr.io provides vulnerability scanning, but the result is not visible to external visitors even if the image is public.
We can export the generated yaml report with commands like
Documented in https://cloud.google.com/container-registry/docs/get-image-vulnerabilities
Do you think that's good enough?