Multiple services per task definition

NotQuiteApex commented 2 years ago

Consul supports defining multiple services for different ports, with their own healthchecks and everything. This is something my company needs to support sending data between containers but also sending out metrics via an admin panel from our applications. Is there a chance this could be implemented in the near future?

pglass commented 2 years ago

Hi @NotQuiteApex, you are correct that the current architecture supports only one service per task.

First, I'm wondering if it's possible to accomplish what you need without multiple services per task. While we are currently limited to one incoming port per service/task for service mesh traffic, you can run additional containers in the task. Those additional containers aren't registered with the Consul service catalog, but they can make outgoing requests through the local Envoy proxy to other services over the service mesh.

This is something my company needs to support sending data between containers

Containers in the same task can talk to each other over localhost, without the traffic leaving the task or needing to go over the service mesh. They can also use shared volumes if there is some file-based sharing. That should allow your containers in a task to send each other data.

What do you think?

but also sending out metrics via an admin panel from our applications.

If these are outgoing requests, this should be possible. All containers in the task can make outgoing requests through the local Envoy proxy to other services over the service mesh. Would that work to support sending these metrics out?

As far as eventually supporting multiple services per task, this should be possible. The way be could support this, near term, is to run one Envoy proxy per service. For three services in the task, you would also be running three Envoy proxy containers.

Note that AWS supports at most 10 containers per task definition, which is non-adjustable, so this will limit the number of services per task. Consul on ECS requires 2-3 containers, and with one Envoy proxy container per service, you would only have room for 3-4 services per task.

NotQuiteApex commented 2 years ago

The issue stems from the fact that our task/container has both incoming and outgoing requests on one port, and only incoming requests on the second port (where the metrics are scraped from, we are using Prometheus to scrape said metrics). Our solution right now is to run Prometheus as a sidecar (g.e. in the same task definition) to scrape and then push those metrics to a Prometheus server via Consul's upstream definitions. It works, but not without additional cost of needing to run that sidecar with every one of our tasks. Since we can't define more than one service, we can't just have the Prometheus server scrape from an additional service definition which can expose that other port. Merging the ports is not ideal for security/intention purposes.

lkysow commented 2 years ago

Hey, couple questions to help me understand your use-case.

Does your app expose prometheus metrics or do you just care about the Envoy sidecar prometheus metrics?
Prometheus supports a consul_sd_config parameter which can be used to get prometheus to pull automatically from Consul services (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#consul_sd_config). Have you tried that out? That would allow the Prometheus server to call out to each task without requiring a prometheus sidecar in each task. Then you just need to configure Envoy to expose prometheus metrics. I can walk you through that but first want to understand if that would even work.

NotQuiteApex commented 2 years ago

Our app exposes Prometheus metrics, we do not scrape the Envoy sidecar metrics. The issue is that those metrics are on a different port (for security reasons) and we would prefer not to change them, hence why having multiple defined services would be beneficial as opposed to just the one.

I can take a look at the solution proposed in (2). However I am inclined to believe it may not work because of the metrics being on a different port than the main application.

hashicorp / terraform-aws-consul-ecs

Multiple services per task definition #134