compactor is not honoring all API delete log requests

Describe the bug In helm deployment simple scalable mode (but maybe on distributed mode) when we have multiple pods with compactor, so multiple "endpoints" for DELETE api, the DELETE requests are sent to "loki-backend" service, then the kubernetes service is "randomly" forwarding to any backend pod.

What I found is that POST DELETE requests sent to backend pods, even where compactor is not "active" (or maybe is not running at all) will stay in "received" status. I found also that GET DELETE requests is following the same service path as POST DELETE requests and depending on the backend pod the requests ends (loki-backend service load balancing) the reply will not be the same.

To Reproduce Steps to reproduce the behavior:

Deploy loki 3.2.1.

Configuration, no TLS/SSL and no AUTH for this test.

Simple Scalable mode with 3 backend, 3 writes, 3 read with only one tenant (fake)

compactor:
  compaction_interval: 10m
  delete_request_cancel_period: 5m
  delete_request_store: s3
  retention_delete_delay: 5m
  retention_delete_worker_count: 150
  retention_enabled: true
limits_config:
  deletion_mode: filter-and-delete
  max_query_lookback: 744h
  query_timeout: 300s
  reject_old_samples: true
  reject_old_samples_max_age: 24h
  retention_period: 744h

Launch several POST DELETE requests with curl (adapt start date and TLS/AUTH depending on your deployment)

Example:

curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="1"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="2"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="3"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="4"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="5"}' --data-urlencode "start=$(($(date +'%s')-2682000))"

Launch several times GET DELETE requests with curl (adapt with TLS and AUTH depending on your deployment):

curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq

You will see that each answer content is not containing the full list of previous DELETE requests. It depends on which loki-backend pod the answer came from.
Repeat GET DELETE step (4) after some time (5 to 15 min) and will see that DELETE requests "processed" are from only one compactor (on only one loki-backend). You will see in corresponding loki-backend logs that only DELETE requests on the "active/running" compactor will be honored. Others will never be taken into account (unless active/running compactor is moving to another pod if the loki-backend pod is deleted for example).

Expected behavior

All POST DELETE requests must be honored by active/running compactor.
All GET DELETE requests should display all requests status.

Environment:

Infrastructure: Kubernetes
Deployment tool: helm

Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.

Tell me if you need more information (like helm values and loki-backend pod logs). The only workaround I found for now is to only deploy one instance of backend.

grafana / loki

compactor is not honoring all API delete log requests #14985