istio / istio

Connect, secure, control, and observe services.
https://istio.io
Apache License 2.0
35.94k stars 7.76k forks source link

Is it possible to run grpc-agent and Envoy side-car in the same pod? #40318

Closed technicianted closed 1 year ago

technicianted commented 2 years ago

Describe the feature request When using grpc-agent template to migrate to proxyless gRPC, we lose the Envoy sidecar. This means that if the pod is using standard HTTP (incoming or outgoing) we'd lose the mesh capabilities.

Are they mutually exclusive?

Describe alternatives you've considered

Nothing yet but perhaps it is conceivable to create custom template that runs both? Not sure if there is a technical blocker there.

Affected product area (please put an X in all that apply)

[ ] Docs [ ] Installation [X] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [X] Security [ ] Test and Release [X] User Experience [ ] Developer Infrastructure

Affected features (please put an X in all that apply)

[ ] Multi Cluster [ ] Virtual Machine [ ] Multi Control Plane

Additional context

technicianted commented 2 years ago

Found ongoing work in #40136

technicianted commented 2 years ago

Looks like there is a serious issue with current grpc-agent where TLS certificates are not renewed: #38923.

technicianted commented 2 years ago

For now I worked around this by creating a custom injection template that merges both grpc-agent and sidecar templates. Worked around #38923 by having istio-sidecar OUTPUT_CERT in addition to grpc-agent while having the latter dump its certs in a temporary volume.

Unfortunately now I have to run 2 side-car containers per pod until these issues are resolved.

anuragagarwal561994 commented 2 years ago

@costinm in istio can we implement name resolver and by default enable the mixer mode and keep only one template if possible.

I believe that the same bootstrap file can be configured for both proxyless and proxy solution and based on the name resolver like if xds:// it can be decided if the request is to redirected via proxyless or proxy

GCP managed traffic director implements the same thing

giantcroc commented 1 year ago

@technicianted Hi, I'm very interested in the case of running 2 side-car containers(envoy+proxyless grpc) per pod. But I meet the problem of traffic redirection, could you tell me that how do you change the istio-iptables? Thanks!

technicianted commented 1 year ago

If you mean traffic redirection of xds based requests, you can simply add necessary annotations to exclude them by either port or IP addresses.

anuragagarwal561994 commented 1 year ago

@technicianted can you give an example on how to run the two setups together.

Use case is that there are some services which are using grpc and some which are using http.

Now if we use envoy proxy here for grpc and http requests, then the pods take extra cpu. In our case the cpu usage is almost half of what the application is taking which is quite expensive.

We are hence migrating our services to use grpc so that we can use grpc proxyless setup and reduce the resource usages and also have load balancing and security capabilities that istio provides.

Currently this is possible by using xds endpoint and adding a template annotation. The deal is that if we use grpc-agent template then envoy proxy is not initialised properly and hence the traffic just pass through.

So we wanted to setup it in such a way that when xds endpoint is used it uses grpc proxyless setup otherwise as directed by the annotations for envoy proxy, so that it gives us time to migrate to grpc as well as at the same time realize the potentials and usages of istio.

Also given there can be certain external services, we will never be able to migrate everything to use grpc hence this will anyways be needed.

costinm commented 1 year ago

It should be possible - but we have not tested or documented this.

With normal sidecar, the agent is generating the grpc bootstrap by default. There are some settings to also save the certificates to file ( can be added in few different ways), and settings to exclude specific ports from interception.

What is missing are docs and tests - everyone is pretty busy and so far this has not been a frequent request, but the design and implementation did consider it.

technicianted commented 1 year ago

@anuragagarwal561994 we are running the same exact setup. Before jumping into details you should be aware that grpc-agent side car is broken due to #38923. So our work included two things:

  1. Work around "mixed mode" not being supported yet.
  2. Work around broken certificate renewal #38923.

For (1), we created a new template grpc-mixed where we run the two containers. We needed minor change to avoid port conflicts between the two side cars. Specifically setting PROXY_XDS_DEBUG_VIA_AGENT=false.

for (2) it was a bit tricky. We needed grpc-agent for xds, but we needed istio-sidecar to provide certs and proxying for none-grpc-xds. This required fiddling with volume injection and changing OUTPUT_CERTS such that certificates from grpc-agent are ignore and ones from istio-sidecar are picked up by grpc xds runtime.

Finally, and most importantly, after doing all this, we faced serious scalability issues with grpc xds implementation. If your istio-sidecar is already taking a lot of resources then it means you have a lot of services in its scope. This will most likely mean that your grpc runtime will end up using more resources. In hour case it was around 2x what Envoy used to take.

Another scalability issue we faced was a hard-coded limit in grpc xds runtime on xds grpc max message size as default. See grpc/grpc-go#5790. We had around 700 services in the scoped namespace. We ended up running a forked version of grpc-go to fix this.

We are hence migrating our services to use grpc so that we can use grpc proxyless setup and reduce the resource usages and also have load balancing and security capabilities that istio provides.

Having understood the above, I strongly suggest you evaluate resource usage of grpc xds runtime before fully committing to the migration.

So we wanted to setup it in such a way that when xds endpoint is used it uses grpc proxyless setup otherwise as directed by the annotations for envoy proxy, so that it gives us time to migrate to grpc as well as at the same time realize the potentials and usages of istio.

We achieved that by excluding designation port for grpc services.

anuragagarwal561994 commented 1 year ago

@technicianted so when we say 700 service, does that mean 700 micro services or 700 pods.

What are some of the scalability issues you encountered may be we can learn from the same.

I am doing a POC per service, not running all of them together but they didn't show me any increase in latency or resources while using GRPC proxyless as compared to the setup when I am not using istio at all, and theoretically it should not as well right?

We have a lot of fan-outs and hence the side-car takes a lot of resources.

For now our clients are in Java, will check if we also have the same concern there.

Right now since we are in evaluation phase, I don't think we will require (2) here.

For this

For (1), we created a new template grpc-mixed where we run the two containers. We needed minor change to avoid port conflicts between the two side cars. Specifically setting PROXY_XDS_DEBUG_VIA_AGENT=false.

I tried creating a template on my own, but it was confusing for me a bit.

What I could see was there exists few templates in the config map, I tried to mix and match from them. But couldn't get far enough, every time something else used to break. Can you share the template if that is okay with you, I have also been waiting for the above mentioned issue to get solved in istio itself, so that we don't have to do this.

anuragagarwal561994 commented 1 year ago

I was referring to issue https://github.com/istio/istio/pull/40136, seems like it has recently been released. Will check if this setup works now.

anuragagarwal561994 commented 1 year ago

@technicianted I tried creating a mixed template, but was stuck at one place, the node ids of both the sidecars were same hence making them conflict with each other.

technicianted commented 1 year ago

What are some of the scalability issues you encountered may be we can learn from the same.

I already explained them above but here is at the summary again:

  1. General Istio/Envoy scalability issue: when you have high client/server cardinality of hundreds of services and thousands of pods in the same namespace, Envoy becomes a compute beast.
  2. Similarly, we use Go. Go xds implementation, at least as it stands today, is not very efficient. For the same setup above, each container needed 1+ CPU cores just to keep up with the xds updates. As we run at large scale, this was considered waste of compute.
  3. Go xds client has no way of setting max grpc message size. As our ADS updates are large, it started to break around 500 services as we exceed the default max messages size of 4MB. We had to patch grpc-go via a fork.

Can you share the template if that is okay with you

Attached based on Istio 1.14. grpc-mixed.yaml.gz This should solve both problems of mixed mode and broken certificate rotation.

anuragagarwal561994 commented 1 year ago

@technicianted thanks a lot for this configuration, it is working. I understood what mistake I was doing, I was trying to use the same envoy proxy because I thought that my grpc client only needs bootstrap-file.

But then I came to realise what is happening and why it can't be done with only one sidecar. It helped me better understand istio and envoy setup as well.

Basically there is an xds server which is created in a proxy which envoy listens to in a normal sidecar proxy. In grpc setup, we just don't need bootstrap file generation (which now happens every time since 1.16.x), we also need this XDS server to listen the events from istio.

But if I use the same proxy, means that my client and envoy both will listen from the same XDS server which is present as a UDS socket in /etc/istio/proxy/. Hence same node id will clash in this case and things will not work properly.

On the other hand, we created 2 sidecars here:

  1. one with envoy enabled, here we have the XDS server on port 15020
  2. another with no envoy enabled, here we just have the XDS server on port 15021

Now since both XDS servers are different, the application is coming up. Just we have to exclude the ports in envoy which are being used by proxyless setup.

Correct me if anything is wrong with my above understanding.

Just one change I did in the template, I added the annotation sidecar.istio.io/rewriteAppHTTPProbers: "false" in the istio-grpc-agent otherwise it was creating a healthcheck that was not working for us.

technicianted commented 1 year ago

Correct

istio-policy-bot commented 1 year ago

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2022-08-06. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.