Closed c0c0n3 closed 4 years ago
indexes are for mongo, so that's not much an issue if you don't put a sidecar in front of it but only in front of orion.
I tried to put a sidecar in front of MongoDB but couldn't get it right. Orion would keep on crashing trying to connect to the DB. Regarding indexes, I tried to use an init container to run a modified version of the Orchestra script in the Orion chart, but that turned out to be a bit of a mission. So yes, glad to know we can do without :-)
the fact that sidecar is not working, must be due to some configuration issue i suppose. if not all istio has some big issue...
ha :-) yes, definitely. It must be config. I've tried everything I could think of but have had no luck so far.
TL;DR. There's likely to be a bug in Istio (1.4
series) that stops sidecars from forwarding HTTP traffic to target pods when naming K8s service/container ports. Workaround: name your service/container port http
, then traffic gets forwarded properly.
This is true when using Istio with our config in deployment
, I didn't have time to test other scenarios in isolation. Gory details below.
We have a plain K8s service for Orion with a port name of ngsi
...
kind: Service
...
spec:
type: NodePort
ports:
- name: ngsi
port: 1026
protocol: TCP
targetPort: 1026
...
and a simple deployment with a matching container port name
...
kind: Deployment
...
containers:
- image: "fiware/orion:2.2.0"
ports:
- containerPort: 1026
name: ngsi
...
We also have Istio config to inject a sidecar and have edited the K8s istio-ingressgateway
service to make port 31026
accessible from outside the cluster and remap it to port 1026
inside the cluster:
ports:
...
- name: orion
nodePort: 31026
port: 1026
protocol: TCP
targetPort: 1026
The Orion sidecar container gets prepped by an Istio init container which sets up iptables
to route incoming and outgoing traffic through the Envoy sidecar. There's no configured rules affecting our port in any specific way, e.g. drop packet if port 1026
. To see that, log onto the node (Minikube in my case)
$ minikube ssh
Grab the sidecar container ID
$ docker ps | grep orion
0fab7c534b44 ... k8s_istio-proxy_orion-5c7...
7f50ef4d7931 ... k8s_orion_orion-5c7...
bd25934a62f1 ... k8s_POD_orion-5c7...
...
Then use it to figure out its PID
$ docker inspect 0fab7c534b44 --format '{{ .State.Pid }}'
12539
and finally make iptables
spill the beans
$ sudo nsenter -t 12539 -n iptables -t nat -S
...
(more about sidecar injection on the Istio blog.)
Another thing to note is that the Envoy proxy gets started with an initial config where you can see a valid definition for our ngsi
port:
$ kubectl exec -it -c istio-proxy orion-5c7d68db9b-4tqvz -- bash
$ ps -F f -A
...
... /usr/local/bin/envoy -c /etc/istio/proxy/envoy-rev0.json ...
...
$ grep ngsi /etc/istio/proxy/envoy-rev0.json
"metadata": { ...
"POD_PORTS": [{"name": "ngsi", "containerPort": 1026, "protocol": "TCP"}],
...
If you try hitting Orion's /v2
API entry point
curl http://$(minikube ip):31026/v2 \
-H header:$HEADER_VALUE # set up HEADER_VALUE as explained in README
you'll get a 503 Service Unavailable
back. Here's what happens under the bonnet:
503
to curl
.You can trace all that with tcpdump
. For example here's how to do that in Orion's sidecar container:
$ kubectl exec -it -c istio-proxy orion-5c7d68db9b-4tqvz -- bash
$ sudo tcpdump -i any -s 1024 -A port 1026
Istio Envoy containers come with tcpdump
but you'll have to install it elsewhere.
Why does the sidecar slam the door closed, i.e. steps 6 and 7 above? Hard to tell---but how rude :-)
One thing is for sure though: this behaviour depends on K8s service/container port names. In fact, renaming our port from ngsi
to http
solved the problem, i.e. Envoy sends the request on to Orion. (You can try this with the httpbin
service too: rename the port to silly
and all hell breaks loose.)
Also, it doesn't look like our adapter can possibly have anything to do with it, message routing from the gateway to the sidecar works fine, iptables
config on the node shouldn't cause any trouble either, so? Well, if I had to point the finger, I'd say there's something wonderfully weird going on in the (Istio modified version of) Envoy.
I think I've got to the bottom of this
https://archive.istio.io/v1.4/docs/ops/configuration/traffic-management/protocol-selection/
even if not explicitly stated, I don't think you can actually use a port name other than those listed for manual protocol selection. If you do, as we saw earlier, the sidecar shuts down the connection. I see a few problems with this:
I'd argue that from a design standpoint (1) isn't the most fortunate choice, but will let philosophers judge that on its own merits. As for (2), it smells like a bug to me and (3) is something the Istio guys definitely need to improve, IMHO.
A slow-starting sidecar could stop Orion from completing successfully its start-up procedure.
When injecting a sidecar, Istio also adds an init container to set up iptables
rules to redirect inbound and outbound traffic to the sidecar. After executing the init container, K8s starts the sidecar and Orion concurrently. As part of its start-up procedure, Orion tries to establish a connection to MongoDB and, since iptables
rules are already in place at this stage, the kernel redirects those TCP packets to the sidecar. If the sidecar isn't ready yet to process messages, packets get dropped. When this happens, I've observed the following outcomes:
We should make it clear to mesh admins that to prevent (1) from happening it's probably a good idea for their K8s configuration to:
sleep 2 && /usr/bin/contextBroker ...
/v2
.Also, Orion logs say DB connections get retried 100 times with a 1000 microsecond delay in between, e.g.
Database Startup Error
(cannot connect to mongo - doing 100 retries with a 1000 microsecond interval)
That would mean only 0.1
milliseconds between the first and last attempt which seems to be a bit too little in a mesh environment---e.g. MongoDB could be much slower than that at start up. Perhaps there's a way to configure Orion so that the time gap between retries is a bit wider? Not sure how much the CLI option -dbTimeout
could help here...
closed by #36
Adding an Istio sidecar to Orion's pod makes the service unreachable. Also it makes it real hard to do any service initialisation through K8s init containers. Sob, sob.
After so much grief trying to figure out why Envoy doesn't play well with Orion, I've decided to park this issue here for now and deal with it some other time to avoid dev paralysis. What's happening is that, with a sidecar, any HTTP call to Orion from another host, be that inside or outside the mesh, fails with a
503
. Requests actually hit the sidecar but then something goes wrong when forwarding to Orion which never actually gets the request and ultimately Envoy returns a503
to the client. Notice that local HTTP calls actually work, i.e. if you log in to the Orion pod and e.g.you'll get the expected response from Orion.
One thing to consider is that deploying Orion without a sidecar may actually be an option. In fact: