grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.74k stars 3.43k forks source link

Add inverted index from specific text to particular chunk. #1282

Open sequix opened 4 years ago

sequix commented 4 years ago

Is your feature request related to a problem? Please describe. No.

Describe the solution you'd like Add another concept "tag", for example, I have 3 log lines like:

{"level":"info","ts":1572522373.1226933,"logger":"apiserver","msg":"count clusters bj","request_id":"6e9bfcf8-d3f1-4f93-bfcf-67e0089524c7"} {"level":"info","ts":1572522373.1358392,"logger":"apiserver","msg":"count clusters gz","request_id":"3c5fb740-3103-4a51-b368-0d890ac70d93"} {"level":"info","ts":1572522373.1358392,"logger":"apiserver","msg":"count clusters su","request_id":"01b0a0a9-107f-4893-b6ed-6e78a6038258"}

And I can export a tag "request_id" from promtail (or fluentd), with three different value like above. Then I can use LogQL like this to query the log lines back quickly:

{logger="apiserver"} /request_id="3c5fb740-3103-4a51-b368-0d890ac70d93"/

Here is the differences between tags and labels:

  1. Tags will not participate the computation of streamID, therefore, it will not affect the granularity of entries compression, so it have a much higher cardinality than labels.
  2. Tags will be stored in table storage, like DynamoDB, and provide the inverted index to specific chunk external key, to reduce the workload of grep all log lines.

I implemented a demo here: https://github.com/sequix/loki/commit/156acf998ab1030646b7359c74f9814d1995f6f2 https://github.com/sequix/cortex/commits/baidu-storage

Describe alternatives you've considered Sure, I can use grep all log lines to complete the same task, but somehow, our storage is cheaper than our CPU. And we want cut our cost further, so I really do not come up a better idea.

Additional context Here is a demo picture. https://i.loli.net/2019/11/19/OuiBplm4Wk35CLQ.jpg

slim-bean commented 4 years ago

Hey @sequix thanks for the interesting idea! Initially I am hesitant to consider adding such functionality to Loki as it kind of goes against sort of the core principles of keeping a small index which helps reduce cost/complexity. Although I don't want to totally dismiss the idea as many people have some higher cardinality data like order_id, client_ip, request_id, etc which they would like to use to quickly query their logs.

I'm not exactly sure yet if we want to cover this use case and how it would work/what it would look like. I'm afraid it wouldn't be super easy as we have to fit it into the current schema used to query chunks (which I don't think has a mechanism for limiting the query to specific chunk id's), as well as how would we handle the growing size/cost of this new index and what would retention limits look like, as well as what would the query language look like (I think I might suggest we use another type of bracketing, we use {} for labels, maybe we should use [] for tags?)

Mostly we need to think long and hard about adding features/complexity such as this to Loki which is probably the biggest concern as we really want to keep the project focused on what it does best and figure out what features we should add.

sandstrom commented 4 years ago

@sequix Great idea! It's awesome that we're all thinking about this problem and various ways to tackle it. I think Loki would benefit hugely from some mechanism for high-cardinality data (however, I also understand the creators concerns about the possible downsides, though I think it's manageable).

If you're interested, there has been some previous discussion in this thread: https://github.com/grafana/loki/issues/91

But can still make sense to keep spikes, or discussions around specific ideas, in separate issues, like this one.

cyriltovena commented 4 years ago

I think our answer here is brute force. We have plan to bring a frontend into Loki that would speed up those queries, currently in our dev env I can regex 7 days of data in 2s, with that frontend.

Le ven. 29 nov. 2019 à 07:54, sandstrom notifications@github.com a écrit :

@sequix https://github.com/sequix Great idea! It's awesome that we're all thinking about this problem and various ways to tackle it. I think Loki would benefit hugely from some mechanism for high-cardinality data (however, I also understand the creators concerns about the possible downsides).

If you're interested, there has been some previous discussion about this in this thread:

91 https://github.com/grafana/loki/issues/91

But can still make sense to keep spikes, or discussions around specific ideas, in separate issues, like this one.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/grafana/loki/issues/1282?email_source=notifications&email_token=AAIBF3LULE37VHT5SP45B63QWEGHRA5CNFSM4JO4NYJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFOZIOI#issuecomment-559780921, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIBF3PQQJMLPOWVIZP2EPTQWEGHRANCNFSM4JO4NYJA .

sandstrom commented 4 years ago

@cyriltovena Sounds awesome! 😄

I think there are many ways to tackle this and brute-force may very well be a good one! With some suger (dedicated symbol for "non-label tags" + query language support) could go a long way!

It's great that you're thinking about this usecase! I would really like to see Loki gain even more usage/success!

Out of curiosity, how much data (mb or number of records/lines) are you searching through in your example? (7 days can be different things depending on volume)

cyriltovena commented 4 years ago

I'm looking at improving the language for sure ! (e.g. Give me logs where the latency is higher than 250ms)

7 days for a full cluster sending 450k log an hour goes around 30s right now but it requires a lot of querier.

I'm planning to add more info about how much data Loki processed.

sandstrom commented 4 years ago

Sounds promising! 😄

We're doing ~10-30k logs / hour and each line (JSON data) is 10-60 kb, so somewhere around 200 mb / hour. It all goes into an Elastic Search cluster with Kibana for querying. We have faceted search support for a bunch of high-cardinality labels, such as IP-address, request-id and a few others.

We're often do searches going 30 days back, sometimes 90 (but we store data for 12 months).

Would love to switch over to Loki, because Elastic Search is somewhat of a burden to operate.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

slim-bean commented 4 years ago

more and more we have requests for high cardinality lookups like IP address or order ID as examples. I would like to keep this issue open to not rule out adding another index for high cardinality labels, not sure what this would look like or if it makes sense for Loki but the discussion is still open.

sandstrom commented 4 years ago

@slim-bean Happy to hear! 🎉

I understand that high-cardinality labels doesn't make much sense for a time-series database like Prometheus, and since what Loki was born from I understand why the initial Loki design assumed low-cardinality labels.

But a lot of logging use-cases need high cardinality labels, so if there is a way for you to support them that would make Loki useful to many more developers & systems. So I'm keeping my fingers crossed that you'll come up with something!

sandstrom commented 1 year ago

@slim-bean Just wanted to check in on this issue. Do you have any plans around this?

It's a bit of a pain-point with our current deployment, and some colleagues are considering a switch to another tool to get around it.

But Loki is great for many things, just not this finding a needle in a haystack kind of problem.

Maybe there is some middle ground, where we could elect to store a few needles in a separate index, for fast retrieval of those chunks. Perhaps a bloom filter? (false positives would only result in fetching a few extra chunks)

More specifically, we have HTTP Request IDs that are uniq, and it's mostly them and trace-ids that we'd want fast retrieval for.

hamishforbes commented 1 year ago

More specifically, we have HTTP Request IDs that are uniq, and it's mostly them and trace-ids that we'd want fast retrieval for.

I have a very similar use case that "brute-force" is not quite doing the job for. Our log volume is fairly high, about 4m access logs per day on average.

Trying to query a unique ID across larger time windows is proving a challenge. Worse, we are moving from an Elasticsearch solution where this kind of query is very fast, we can query for an ID across the entire log volume in seconds.

Not being able to do this in Loki is causing a bit of friction and adoption problems with developers.

As I understand it Tempo uses bloom filters to solve a similar problem (being able to query for unique trace IDs) Could this functionality be brought into Loki in some fashion too?

Maybe the solution is to simply implement tracing and use Tempo though...

marcusteixeira commented 9 months ago

News comming on release 3.0 with structured metadata.

Check this:

chaudum commented 6 months ago

[!NOTE] Bloom filters are an experimental feature and are subject to breaking changes.

@sequix Additionally to structured metadata (which isn't really an index, but rather additional data to the log line, which is the underlying engine for our OTel support), experimental query acceleration with bloom filters has been released with Loki 3.0, which is build for solving the needle-in-the-haystack search (uuid search) you described. It is more generic than an inverted index on a specific "field", though.

If you have question regarding bloom filters, I suggest that you read this doc first.