Availability zone local replication

thatsmydoing commented 2 years ago

Is your feature request related to a problem? Please describe. I'm looking to run loki in a standard EKS cluster across a few AZs. All the logs will be coming from within the cluster and I'd like to avoid the inter-zone bandwidth cost when sending logs. I don't need HA across zones since if an AZ goes down then there won't be any logs since all the clients for that AZ will be down anyway.

Describe the solution you'd like I understand that loki now support zone aware replication which ensures that data exists in multiple AZs. I'd like the opposite which ensures that the distributor only forwards data to ingesters in the same AZ. Queriers should still be able to query from all available AZs.

Describe alternatives you've considered I've considered just running multiple loki clusters per AZ but that's a bit unwieldy as querying multiple clusters is not well supported as described in https://github.com/grafana/loki/issues/1866

I've also thought about blocking access between distributors and ingesters in different AZs but that gets quite spammy and I'm not sure if it's even safe to do.

Additional context When I was first looking into loki, my mental model was that each component has it's own "ring" so I was trying to do something like having a different ingester ring for the distributor to only be ingesters in the same AZ and a different ingester ring for the queriers which would be all of them. Unfortunately, that's not the model that loki uses.

stale[bot] commented 2 years ago

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

thatsmydoing commented 2 years ago

Not stale

kovaxur commented 2 years ago

We are also facing this issue, it would be good to have some design guide how to avoid these costs

forsberg commented 2 years ago

I'm new to loki, so I may have misunderstood how storage and indexing works, but I'm pondering if it may work to do something like this:

Use a cloud-based storage for indexes and data (i.e boltdb-shipper, with S3 as blob storage)
Run N replicas of loki writer, where N is the number of availability zones. Each replica in a different zone.
Upon startup of loki writer, run an initContainer that figures out its zone, and sets labels on the writer pod.
Define one Kubernetes service per zone, targeting only the loki writer pods in its zone, by selecting on above labels.
In promtail, also run an initContainer that figures out the zone, then adjusts the client URL to the per-zone service for the zone in question.

Perhaps I'm missing something in the above idea?

Personally, I would ignore the traffic from/to loki reader replicas, as I write much more than I read, but that would depend on use case.

janvanbesien-ngdata commented 1 year ago

Can this not largely be solved by Kubernetes itself via "topology aware hints": https://kubernetes.io/docs/concepts/services-networking/topology-aware-hints/? In other words, simply setting the service.kubernetes.io/topology-aware-hints annotation to auto on the services that are used to reach loki writer pods?

thatsmydoing commented 1 year ago

It's been a while so I might be mistaken, services can handle partitioning promtail to distributor communication which is good, but it does not address distributor to ingester communication. While the diagrams show that a "writer" can have a distributor and ingester running together, that does not mean that the distributor favors the "local" ingester.

A distributor may send the log data to any ingester even in a different zone. Service discovery for ingesters is also done internally via rings so kubernetes services has no impact here.

diranged commented 11 months ago

Just chiming in here - as we have the exact same desire. On ingestion, data has the potential to cross-AZs many times in a Loki Microservice Distributed model:

Via Kubernetes Service: promtail -> loki-gateway
Via Kubernetes Service: loki-gateway -> loki-distributor
Via Loki GRPC Communication over Memberlist: loki-distributor -> loki-ingester-X, loki-ingester-Y, loki-ingester-Z

We can now take care of the first two hops by running a larger number of loki-gateway and loki-distributor pods. However, the third hop is tricky because it's actually multiplied by our replication factor:

~Via Kubernetes Service: promtail -> loki-gateway~
~Via Kubernetes Service: loki-gateway -> loki-distributor~
Via Loki GRPC Communication over Memberlist: loki-distributor -> loki-ingester-X (replica stream 1)
Via Loki GRPC Communication over Memberlist: loki-distributor -> loki-ingester-Y (replica stream 2)
Via Loki GRPC Communication over Memberlist: loki-distributor -> loki-ingester-Z (replica stream 3)

We want to tell the Loki system to prioritize sending the data to Loki Ingesters within the same zone whenever possible. If there are too few ingesters to make that happen, it should still try to select ingesters within the same zone to whatever degree it can. For example: If there are 2 ingesters in ZoneA and 2 in ZoneB... then a Distributor in ZoneA should send 2 streams to ZoneA and only one stream to ZoneB.

carlopalacio commented 11 months ago

I am also facing this issue.

grafana / loki

Availability zone local replication #5319