Describe the bug
In helm deployment simple scalable mode (but maybe on distributed mode) when we have multiple pods with compactor, so multiple "endpoints" for DELETE api, the DELETE requests are sent to "loki-backend" service, then the kubernetes service is "randomly" forwarding to any backend pod.
What I found is that POST DELETE requests sent to backend pods, even where compactor is not "active" (or maybe is not running at all) will stay in "received" status.
I found also that GET DELETE requests is following the same service path as POST DELETE requests and depending on the backend pod the requests ends (loki-backend service load balancing) the reply will not be the same.
To Reproduce
Steps to reproduce the behavior:
Deploy loki 3.2.1.
Configuration, no TLS/SSL and no AUTH for this test.
Simple Scalable mode with 3 backend, 3 writes, 3 read with only one tenant (fake)
Launch several POST DELETE requests with curl (adapt start date and TLS/AUTH depending on your deployment)
Example:
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="1"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="2"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="3"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="4"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
curl -G -X POST 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' --data-urlencode 'query={job!="5"}' --data-urlencode "start=$(($(date +'%s')-2682000))"
Launch several times GET DELETE requests with curl (adapt with TLS and AUTH depending on your deployment):
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
curl -s -X GET 'http://loki-gateway.loki-ns.svc.cluster.local/loki/api/v1/delete' | jq
You will see that each answer content is not containing the full list of previous DELETE requests. It depends on which loki-backend pod the answer came from.
Repeat GET DELETE step (4) after some time (5 to 15 min) and will see that DELETE requests "processed" are from only one compactor (on only one loki-backend). You will see in corresponding loki-backend logs that only DELETE requests on the "active/running" compactor will be honored. Others will never be taken into account (unless active/running compactor is moving to another pod if the loki-backend pod is deleted for example).
Expected behavior
All POST DELETE requests must be honored by active/running compactor.
All GET DELETE requests should display all requests status.
Environment:
Infrastructure: Kubernetes
Deployment tool: helm
Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.
Tell me if you need more information (like helm values and loki-backend pod logs).
The only workaround I found for now is to only deploy one instance of backend.
Describe the bug In helm deployment simple scalable mode (but maybe on distributed mode) when we have multiple pods with compactor, so multiple "endpoints" for DELETE api, the DELETE requests are sent to "loki-backend" service, then the kubernetes service is "randomly" forwarding to any backend pod.
What I found is that POST DELETE requests sent to backend pods, even where compactor is not "active" (or maybe is not running at all) will stay in "received" status. I found also that GET DELETE requests is following the same service path as POST DELETE requests and depending on the backend pod the requests ends (loki-backend service load balancing) the reply will not be the same.
To Reproduce Steps to reproduce the behavior:
Expected behavior
Environment:
Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.
Tell me if you need more information (like helm values and loki-backend pod logs). The only workaround I found for now is to only deploy one instance of backend.