Closed Totalus closed 1 year ago
Found the issue.
The Swift Object storage API has a limit in the number of objects it can return in a single request (reference).
This is not taken into consideration in the implementation of the Swift Client List()
function (here).
When the number of object returned reaches the limit, the returned list will not be complete. This causes the cached client to not not contain all the index names when building the cache (here).
That causes the files list to be empty for the given index (here), which then causes the files of this index to be deleted on the next sync.
Describe the bug
For a same query, loki will return data when the chunk is first loaded, but will start returning empty after a checkpoint / building the index list cache.
Background
In another issue, I prevously described odd gaps in data that I was observing in my queries on grafana (https://github.com/grafana/loki/issues/8838#issuecomment-1485240124). I am not sure the bug I am experiencing is actually related to this issue anymore, so I am opening this new issue to address it.
To Reproduce
The issue is hard to repreduce because it happens on my data set, and I can obvously not share all my logs with you, but I have compiled Loki locally and am able to reproduce the issue. So feel free to make suggestions on things I could or should test and I'll share the results.
Here's how I repreduce the issue and what I am observing:
I start loki with my configuration (shared below). The cache directory does not exist and is created, empty. After startup, I run a first query which returns some data. At that point I can see that an index folder is created in the cache directory (
boltdb-cache/index_XXXX
) which contains a great number of files (and also an empty folder named by the user). I can send the same query multiple times and I get the same data returned.I wait a bit. At one point, I can see that the index seems to be refreshed and a new checkpoint is made. Loki logs something like the following:
After that happens, all the files in
boltdb-cache/index_XXXX
are gone. Only theindex_XXXX
folder remains.Now if I run the query again, I get an empty result.
It feels to me like the chunk is unloaded, but the index not properly updated to reflect that it is not available anymore and needs to be reloaded if a query is made. That's just my naive guess.
Expected behavior
I am expecting Loki to return consistent data for a same query whenever I make that query, obvously.
Environment:
Loki configuration:
Loki config file
Logs when loki starts
As you can see there are a few experimental features in use (namely OpenStack Swift Storage and In-memory (FIFO) cache).