Closed jkassis closed 2 years ago
@jkassis thanks for the report. There have been some changes to the iptables logic in the last couple of weeks, so would you mind seeing if you can reproduce this with the latest edge release?
As long as you're not using CNI, this error shouldn't occur.
Also, can you tell me where you're running openshift? Locally or in the cloud?
openshift in aws. ok i will try edge.
On Fri, Aug 7, 2020 at 2:48 PM cpretzer notifications@github.com wrote:
@jkassis https://github.com/jkassis thanks for the report. There have been some changes to the iptables logic in the last couple of weeks, so would you mind seeing if you can reproduce this with the latest edge release https://github.com/linkerd/linkerd2/releases/tag/edge-20.7.5?
As long as you're not using CNI, this error shouldn't occur.
Also, can you tell me where you're running openshift? Locally or in the cloud?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linkerd/linkerd2/issues/4851#issuecomment-670726050, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3WITFRIINXMRAZLGTAAGDR7RZDFANCNFSM4PX5POIQ .
Have you tried using CNI yet? AFAICT OpenShift 4.1+ uses nftables
which would fail when combined with proxy-init
. You'll also want to:
oc adm policy add-scc-to-group anyuid system:serviceaccounts:linkerd
oc adm policy add-scc-to-group privileged system:serviceaccounts:<application-ns>
oc adm policy add-scc-to-group anyuid system:serviceaccounts:<application-ns>
i tried edge with the policy group scc additions you recommended...
` lastState: terminated: exitCode: 1 reason: Error message: >+ mp-port-unreachable
-A OUTPUT -d 169.254.169.254/32 -p udp -m udp ! --dport 53 -j REJECT
--reject-with icmp-port-unreachable
COMMIT
# Completed on Thu Aug 13 03:00:41 2020
configuration
------------------------------------------------------------
Will ignore port [4190 4191] on chain PROXY_INIT_REDIRECT
Will redirect all INPUT ports to proxy
Ignoring uid 2102
Will ignore port [443] on chain PROXY_INIT_OUTPUT
Redirecting all OUTPUT to 4140
adding rules
------------------------------------------------------------
:; iptables -t nat -N PROXY_INIT_REDIRECT -m comment --comment
proxy-init/redirect-common-chain/1597287641
iptables: Chain already exists.
Aborting firewall configuration
Error: exit status 1
Usage:
proxy-init [flags]
Flags:
-h, --help help for proxy-init
--inbound-ports-to-ignore strings Inbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
-p, --incoming-proxy-port int Port to redirect incoming traffic (default -1)
--netns string Optional network namespace in which to run the iptables commands
--outbound-ports-to-ignore strings Outbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
-o, --outgoing-proxy-port int Port to redirect outgoing traffic (default -1)
-r, --ports-to-redirect ints Port to redirect to proxy, if no port is specified then ALL ports are redirected
-u, --proxy-uid int User ID that the proxy is running under. Any traffic coming from this user will be ignored to avoid infinite redirection loops. (default -1)
--simulate Don't execute any command, just print what would be executed
--timeout-close-wait-secs int Sets nf_conntrack_tcp_timeout_close_wait
-w, --use-wait-flag Appends the "-w" flag to the iptables commands
startedAt: '2020-08-13T03:00:41Z'
finishedAt: '2020-08-13T03:00:41Z'
containerID: >-
cri-o://6bc4aa7e2ad6419849bb3915a6c4f1729b0235117abe00a332ca88f3dee55df3
ready: false
restartCount: 5
image: 'gcr.io/linkerd-io/proxy-init:v1.3.4'
imageID: >-
gcr.io/linkerd-io/proxy-init@sha256:5e9ce6c12258bd398f7961961ffeb6dcc725e192a37c2d2a07e919b9a7ce3101
containerID: 'cri-o://ff223f79b1601efa8ac81bf0ee2aa7b6eaf82ea1c94c98ef84c6c708a7e305bf'
`
this is the amazon api service link local IP (https://stackoverflow.com/questions/42314029/whats-special-about-169-254-169-254-ip-address-for-aws)... and not surprising... is the reason i'm looking at linkerd in the first place.
the openshift sdn-cni-plugin hardcodes a rule (https://github.com/openshift/origin/blob/release-4.1/cmd/sdn-cni-plugin/openshift-sdn_linux.go#L129) to block this port, which is causing all kinds of horror for getting my app installed. it breaks kube2iam (https://github.com/jtblin/kube2iam), kiam (https://github.com/uswitch/kiam), and now... apparently linkerd, all of which make firewall rules to redirect traffic for this service.
i'm beginning to think the sdn-cni-plugin
needs to be replaced with something else to fix all of these. i want to access to the AWS API with no B.S. from Openshift and I want to use linkerd for my service mesh.
what do i do?
well. looks like i can migrate off of openshift-sdn... https://docs.openshift.com/container-platform/4.5/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html
i ran the migration to OVN (https://docs.okd.io/latest/networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.html) and then the linkerd install again and ran into the same mp-port-unreachable
error.
so... frankly... i'm confused why this would come up since i'm not running the sdn-cni-plugin theoretically.
here is the network status for the pod...
here's the PR where they nerfed the AWS API... https://github.com/openshift/origin/pull/22826
and here's a discussion of the AWS / GOOGLE choice to use link local address... https://stackoverflow.com/questions/42314029/whats-special-about-169-254-169-254-ip-address-for-aws
. I'm also wondering how a link local address resolves to an amazon API. how does the AWS CLI on my Mac make a request through a link-local address to get to AWS API in the cloud?
i destroyed my Openshift cluster and recreated it with Calico networking. getting this now...
initContainerStatuses:
- name: linkerd-init
state:
waiting:
reason: CrashLoopBackOff
message: >-
back-off 10s restarting failed container=linkerd-init
pod=linkerd-controller-588d778444-7dfc5_linkerd(946ae8fa-6b67-4794-b044-56079cdff6a6)
lastState:
terminated:
exitCode: 1
reason: Error
message: >+
o 4140
2020/08/13 21:11:31 Executing commands:
2020/08/13 21:11:31 > iptables -t nat -N PROXY_INIT_REDIRECT -m
comment --comment proxy-init/redirect-common-chain/1597353091
2020/08/13 21:11:31 <
2020/08/13 21:11:31 > iptables -t nat -A PROXY_INIT_REDIRECT -p tcp
--match multiport --dports 4190,4191 -j RETURN -m comment --comment
proxy-init/ignore-port-4190,4191/1597353091
2020/08/13 21:11:31 <
2020/08/13 21:11:31 > iptables -t nat -A PROXY_INIT_REDIRECT -p tcp
-j REDIRECT --to-port 4143 -m comment --comment
proxy-init/redirect-all-incoming-to-proxy-port/1597353091
2020/08/13 21:11:31 < iptables: No chain/target/match by that name.
2020/08/13 21:11:31 Aborting firewall configuration
Error: exit status 1
Usage:
proxy-init [flags]
Flags:
-h, --help help for proxy-init
--inbound-ports-to-ignore strings Inbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
-p, --incoming-proxy-port int Port to redirect incoming traffic (default -1)
--netns string Optional network namespace in which to run the iptables commands
--outbound-ports-to-ignore strings Outbound ports and/or port ranges (inclusive) to ignore and not redirect to proxy. This has higher precedence than any other parameters.
-o, --outgoing-proxy-port int Port to redirect outgoing traffic (default -1)
-r, --ports-to-redirect ints Port to redirect to proxy, if no port is specified then ALL ports are redirected
-u, --proxy-uid int User ID that the proxy is running under. Any traffic coming from this user will be ignored to avoid infinite redirection loops. (default -1)
--simulate Don't execute any command, just print what would be executed
--timeout-close-wait-secs int Sets nf_conntrack_tcp_timeout_close_wait
-w, --use-wait-flag Appends the "-w" flag to the iptables commands
startedAt: '2020-08-13T21:11:31Z'
finishedAt: '2020-08-13T21:11:31Z'
containerID: >-
cri-o://e9d662183ed980a73f6404dfbf503eb43731f7bb712a074597af2c27493e182e
ready: false
restartCount: 1
image: 'gcr.io/linkerd-io/proxy-init:v1.3.3'
imageID: >-
gcr.io/linkerd-io/proxy-init@sha256:e9d5d020b84c80f964449d62ea509a45b9448655d3aecd7371e54d0acd42665a
containerID: 'cri-o://e9d662183ed980a73f6404dfbf503eb43731f7bb712a074597af2c27493e182e'
it's not complaining about the aws api address anymore, so maybe i've dodged that bullet with calico.
@jkassis thanks for the updates. It's taking me a bit longer to get the OpenShift set up, but I haven't forgotten about this.
Have there been any work done on this. We are attempting to install on OpenShift version 4.5.9 UPI on vsphere and are running into the same issues. Thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
@mattpolsen whatever you do, i recommend using calico as the network layer.
@MattPOlson @cpretzer did you ever get this to work on OCP? @jkassis did you get it to work on any other SDN than calico on OCP? Did you ever find out the underlying cause?
@MattPOlson @jkassis @davidkarlsen
I had a chance to spend some time with OpenShift and get Linkerd running on it. You can find the gist here.
Couple of notes about this:
Please give it a try and let us know how it goes
I'd also love your feedback about a different security approach. According to the docs, there is the notion of a system user
, which sounds appropriate for the Linkerd control plane components. I haven't found any docs on how to go about creating one of those users and assigning it to the Linkerd components. If you all have any thoughts or know how to do that, your pointers would be helpful.
@cpretzer does this allow for the sidecars to run unprivileged (i.e. which mode does it run in). Do the webhooks need to be disabled for it to run on OCP (I see you turned them off) - or was that just preference?
@davidkarlsen this was OCP deployed to AWS (I doubt the provider matters, though).
I didn't make any changes to the proxy privileges, so they will have the default privileges on the pod that they're injected into. Here is the template for the proxy securityContext.
The webhook labels are necessary on the Linkerd control plane to prevent those pods from being injected, and those labels/annotations are taken directly from the default Linkerd YAML files.
@davidkarlsen one more thought on the privileges for this deployment is that the OpenShift SDN uses CNI, so the Linkerd CNI Plugin is appropriate for use here. Using the CNI Plugin delegates the responsibility of configuring iptables to the DaemonSet that is deployed by the CNI plugin. So, the Linkerd init container is no longer necessary on each of the meshed pods.
I hope this helps, and please let us know if you end up trying this out
Bug Report
What is the issue?
linkerd-controller pod wont start
How can it be reproduced?
Install Openshift 4.5.
Install linkerd as follows...
Logs, error output, etc
linkerd check
outputEnvironment
[I] jkassis@Jeremys-MBP ~/c/c/live> linkerd version 08.07 12:03 Client version: stable-2.8.1 Server version: unavailable
Possible solution
Additional context