Open itsmylife opened 2 weeks ago
@catherineymgui Should we make the logs view fill all the space underneath the selected metric view?
@zhehao-grafana Could you please help us with the instrumentation? What metrics/events do we need to track?
4 of us need to sync regarding how the initial flow should look like since I don't see a consensus reached in the design doc. I want to make sure we are aligned before adding additional things like event tracking
When there are multiple recording rules with the same name but different labels, the user has to pick the distinctive label to see the related logs to that recording rule. I know I sound confusing so please check the following recording rule data.
{
"rules": [
{
"name": "loki_tenant:query_count:lookback_period",
"query": "sum by (cluster,namespace,org_id)(count_over_time({container=\"query-frontend\", namespace=~\"loki.*\"} |= \"caller=metrics.go\" |= \"start_delta=\" | logfmt | start_delta<=3h0m0s[2m]))",
"labels": {
"period": "less than 3h"
},
"health": "ok",
"type": "recording",
"lastEvaluation": "2024-10-30T16:47:48.398968109Z",
"evaluationTime": 2.846429586
},
{
"name": "loki_tenant:query_count:lookback_period",
"query": "sum by (cluster,namespace,org_id)(count_over_time({container=\"query-frontend\", namespace=~\"loki.*\"} |= \"caller=metrics.go\" |= \"start_delta=\" | logfmt | ( start_delta>3h0m0s , start_delta<=12h0m0s )[2m]))",
"labels": {
"period": "between 3h and 12h"
},
"health": "ok",
"type": "recording",
"lastEvaluation": "2024-10-30T16:47:51.245410108Z",
"evaluationTime": 2.218408009
},
{
"name": "loki_tenant:query_count:lookback_period",
"query": "sum by (cluster,namespace,org_id)(count_over_time({container=\"query-frontend\", namespace=~\"loki.*\"} |= \"caller=metrics.go\" |= \"start_delta=\" | logfmt | ( start_delta>12h0m0s , start_delta<=24h0m0s )[2m]))",
"labels": {
"period": "between 12h and 1d"
},
"health": "ok",
"type": "recording",
"lastEvaluation": "2024-10-30T16:47:53.463833499Z",
"evaluationTime": 2.130151779
}
]
}
In the json above we have the same rule name (loki_tenant:query_count:lookback_period
) multiple times with different query and labels. We will show the logs but put a warning for such cases to help/guide user to apply period
filter (period
filter is specific to this recording rule, other rules might have some other labels) so we will be able to show more specific logs.
But for the sake of PoC, we will only show the logs of the first matching recording rule. The logic I explained above will be implemented later on.
cc: @zhehao-grafana @graph-andrew
But for the sake of PoC, we will only show the logs of the first matching recording rule.
What if we show all relevant logs of all the rules created? are there any obvious limitations regarding this approach?
But for the sake of PoC, we will only show the logs of the first matching recording rule.
What if we show all relevant logs of all the rules created? are there any obvious limitations regarding this approach?
We use the underlying query that Loki recording rule has to fetch the logs. For all logs we need to send multiple queries in one request. I think Loki can handle that, but that needs to be checked. For the first phase, we think showing logs for the first matching rule good enough. What do you think?
But for the sake of PoC, we will only show the logs of the first matching recording rule.
What if we show all relevant logs of all the rules created? are there any obvious limitations regarding this approach?
Another consideration is that we are currently limiting logs queries to the first 100 logs, so even if we combine the queries, only the first 100 log lines will be shown (and the first query could easily produce >100). We can increase this value, but 100 seemed like a reasonable place to start, as per @svennergr 's suggestion here.
We use the underlying query that Loki recording rule has to fetch the logs.
Let's do this then, but we can consider providing users with more information about which one we end up selecting, like certain filter details on the loki side
We should be able to see/reach relevant logs from the metrics app. Design Doc: https://docs.google.com/document/d/1vFqk-Cs_zw5vR-TkuhLI85Fa0709YV07tc3hzIGZtlA/edit
Phase 1: PoC to show logs only for Loki recording rules https://github.com/grafana/grafana/pull/94656
Tasks to ship PoC:
'metric_action_view_changed': 'related-logs'
,'related_logs_action_clicked': 'open_explore_logs'
)A rule might be defined under two different rule groups. Find a way to handle this situation.We'll handle this in a better way in Phase 2