Open ronamosa opened 4 years ago
@ronamosa, thanks for putting this together! We are currently struggling with exactly the same behaviour (i.e. got an mTLS origination sample to work but seeing other sidecars trying to find the certificates which are only relevant for istio-egressgateway
).
@ronamosa Could you please post the final yaml files of your working setup here? Because I'm stuck in the exact same issue, I've tried to follow along with your modifications, but I still cannot get it to work.
hey @cedricroijakkers I documented the whole experience here https://iamronamo.io/documentation/2020-04-08-Istio-MTLS-with-External-Endpoint/ cant find where I saved the yamls from this PoC, but I'll see if I can dig them out. Hope that helps.
@ronamosa thanks for your report, you just saved my job :)
I tried following your steps, but I still keep getting errors 503. I've installed an stunnel instance now to do the ssl offloading, but still, it would be great to finally have this working in istio. Is there nobody who has a full working yaml configuration set that I can try?
What is the official fix from Istio on this ? There must be a lot of people wanting to get outbound MTLS from Istio to external sites working.
Also waiting for some official statement on this. Meanwhile, has anyone tested the adapted configuration in an IPv6 environment? Last time I tried it was working on IPv4 but not working on IPv6. I need to recreate the exact issue for logs and to really make sure before I can submit a separate ticket though.
I recently set up mTLS egress with Istio and worked through a number of issues before settling on the pattern described here: https://istio.io/latest/docs/tasks/traffic-management/egress/egress-tls-origination/#mutual-tls-origination-for-egress-traffic
This config does not use an egress gateway and requires the new v1.14 DestinationRule.spec.workloadSelector
, but the config is far simpler than using an egress gateway and allows us to be more selective about the mTLS client cert being used (we have different pods with different client certs connecting to the same external service).
I have had a lot of trouble getting the example at https://istio.io/docs/tasks/traffic-management/egress/egress-gateway-tls-origination/ to work.
I modified for it for a real external mtls server -- a vm with nginx running with the same cert setup as the documentation and I tested it works if you use curl and the correct certificates.
I may be misunderstanding some things in the config, but have not found much help in discuss.istio.io or the slack channel.
When I follow the example (substituting for my own values) I get different errors in different areas and I'll point them out below.
Here is my setup as per documentation:
Installation
installed version
How I installed Istio:
Certs & Patch Egressgateway
I create the client & ca-cert secrets in the
istio-system
namespace:I patch the
istio-egressgateway
deployment to add the secrets/certs volumes and mounts and can see them when I check the pod:example nginx-client-certs
TLS Origination Configs
If I start following the example from Perform mutual TLS origination with an egress gateway
I end up with the following configuration:
creates 4 x objects
Errors
invalid path nginx certs
checking the istio-proxy container for a
sleep
pod inside mymesh-internal
namespace:invalid path /etc/certs/root-cert.pem
checking the
istio-egressgateway
pod I can see the following errors as well:Questions
Why is the sidecar trying to find certs that are only mounted on the
istio-egressgateway
pod?Why is the
/etc/certs/root-cert.pem
path documented but I can't seem to find it in a pod or container anywhere? I think I've seen/var/run../root-cert.pem
somewhere... is this because SDS enabled means this documentation needs to be updated?Workaround / Fixes (?)
After a lot of reading through istio githubs issues and discuss.istio.io forum, I pieced together the following changes that eventually lead to a successful TLS client-verified session with my external MTLS server.
/etc/root-cert.pem fix
I changed port protocal from
HTTPS
to
TLS
And the error goes away. I'm assuming its because there's a
tls
section there and cert lookups get treated differently?Invalid path: /etc/istio/nginx-ca-certs/ca-chain.cert.pem fix
For this one I came across an open issue where someone advised the sidecar of the pod calling the MTLS backend server needs to have the certs mounted to it - which sort of defeats the purpose of this "egressgateway will handle verifying calls to the backend using istio" example right?
Anyway, I did the following:
nginx-client-certs
andnginx-ca-certs
secrets inside my namespacemesh-internal
(where mysleep
pod is deployed)added the following annotations (
sidecar.istio.io/userVolumeMount
andsidecar.istio.io/userVolume
) to my sleep pods deployment manifest:Now my
sleep
pod doesn't complain about the nginx certs anymore. I see other pods like prometheus and an httpbin pod in mymesh-internal
namespace complaining about not finding the certs, but I understand (currently) it's because I haven't "sidecar mounted" these certs directly to them.Add ServiceEntry and VirtualService
I added a
ServiceEntry
andVirtualService
combination (it wasn't clear in the example that I needed to have one, and the previous section of the documentation delete's the ServiceEntry so the following section seems to go ahead without one and doesn't specify creating a new one?)Changed Gateway to HTTP
Changed this from tls..
to HTTP
Changed DestinationRule
from
to
Changed VirtualService to port 80
Now that my
Gateway
is port 80, I update the following route fromistio-egressgateway.istio-system.svc.cluster.local:443
to
istio-egressgateway.istio-system.svc.cluster.local:80
And then it all works.
Working Output
So now when I curl from the
sleep
pod inside themesh-internal
namespace, I get the expected output:From the sleep pod's
istio-proxy
container I can see it hitting my port 80 outbound endpoint:and from the
istio-egressgateway
pod I can see it going outbound on 443:Conslusion
Sorry this is really long, but I don't understand how the original/current documentation was meant to work-- and my workaround acheives the objective, but functionally its limited to specific deployments that have the right annotations.
Any help understanding where I might've gone wrong would be greatly appreciated.