Flush queue workers can race and flush the same chunks

While investigating something else, I noticed that the metric loki_chunk_store_deduped_chunks_total for a Loki cluster with replication_factor: 1 was not zero like I would expect.

The metric is incremented whenever a chunk is sent to flush but is already found in the chunk cache (meaning it has already been successfully flushed).

When running a replication factor > 1 the cache is used as a fast check to de-dupe chunks being sent to the store, if the chunk is in the cache it was already successfully written to the store by someone else (typically another ingester which is in the replicaset for the given stream)

For replication_factor: 1 I would not expect to see this metric ever incremented, upon a cursory investigation it looks like the flush queue workers can race over the same chunks, we lock the list of chunks to gather a list that need to be flushed, then release the lock and attempt to flush them. It looks like this creates an opportunity for another worker to then gather the same chunk and also attempt to flush it.

The impact here is small, if you don't have a chunk cache the same chunk is sent to the store twice which is effectively a no-op (although does take some network/cpu etc), if you do have a chunk cache it's quickly found as a dupe and not sent to the store.

It does throw off the metric of how effective the dedupe code is working though which is probably the biggest impact.

This graph is from the environment I found this, the green line is the number of chunks flushed and the yellow is the deduped chunks, it's a small number that get caught in this race but we should still fix this to make it zero.

Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

Mark issues as revivable if we think it's a valid issue but isn't something we are likely to prioritize in the future (the issue will still remain closed).
Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.

grafana / loki

Flush queue workers can race and flush the same chunks #5194