grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.85k stars 3.44k forks source link

Helm: Incorrect nginx.conf when upgrading using SimpleScalable<->Distributed deploymentMode #13726

Open nhippe-ds opened 3 months ago

nhippe-ds commented 3 months ago

Describe the bug When setting deploymentMode to SimpleScalable<->Distributed, the generated nginx.conf retains references to SimpleScalablecomponents instead of correctly transitioning to Distributed components. This issue appears to stem from the logic in the _helpers.tpl file.

I wound up just changing deploymentMode to Distributed after the loki-gateway configmap was not updated to distributed components.

To Reproduce Steps to reproduce the behavior:

  1. Updated deploymentMode from SimpleScalable to SimpleScalable<->Distributed
  2. Template the values.yaml and compare results for loki/templates/gateway/configmap-gateway.yaml

Expected behavior If users are expected to migrate from SimpleScalable to Distributed using deploymentMode: SimpleScalable<->Distributed the nginx.conf should be updated to reference the appropriate Distributed components instead of retaining old SimpleScalable references. The configuration should include correct URLs for components like distributor, queryFrontend, ingester, ruler, etc., as per the Distributed deployment mode.

Environment:

iandrewt commented 3 months ago

Not sure if this should be a separate issue, but I found when attempting the same migration the ingester paths in the nginx config pointed to a non-existent loki-ingester service when using zone awareness. There are only zone-aware service resources. With zone awareness disabled, the ingester paths work fine

lindeskar commented 1 month ago

I'm actually changing my mind on this one. I no longer think this is an issue. (deleted my old comment)

My understanding is that the SimpleScalable<->Distributed mode is meant to be used with replicas > 0 for all components from both the SimpleScalable and Distributed modes at the same time. This to get all components up and running and joined to the rings before switching to Distributed mode to get the traffic directed to the new distributed components in Nginx.

My view on the migration steps from SSD to distributed as follows:

  1. Set deploymentMode: SimpleScalable<->Distributed and increase replicas for the distributed components.
  2. Wait for all distributed components to be ready and join rings. Traffic is still routed by Nginx to SSD components (write/read/backend) but some distributed components (ex. Ingesters) will start to receive traffic because of the rings.
  3. Set deploymentMode: Distributed and remove values for SSD components.
  4. Traffic should now be routed by Nginx to the distributed components.