Closed candlerb closed 4 months ago
You need to set loki.instance
to match the name of your prometheus job as that's what we get through the dropdown at the top of the dashboard.
But if you do that, you lose the visibility of which host each log message came from. If your prometheus job for scraping metrics is called "incus" then you'll have to set loki.instance="incus"
on all nodes, and then all logs will just say instance="incus"
. That isn't very useful.
Whereas for metrics, instance="XXX" tells you the host which is scraped, and therefore which host the container is running on. This is true even for clusters according to the documentation:
In a cluster environment, Incus returns only the values for instances running on the server that is being accessed. Therefore, you must scrape each cluster member separately.
It would be strange if metrics had instance="nuc1", instance="nuc2", instance="nuc3" but logs all had instance="incus".
You don't, the location
field still tells you where things are.
I get location="none"
in my loki logs (you can see it in the examples above), and I don't see a way to override it. Unless you mean I should use loki.labels
to set it?
I also get instance="none"
when loki.instance
is unset (reported separately at #762). Documentation does say it should default to the hostname, which implies it should work like a prometheus instance label.
Yeah, those two fields seem wrong in non-clustered case, both should default to your hostname in non-clustered cases and instance
should be overrideable through loki.instance
.
From your suggested changes above, I'll be taking the project~="|$project"
part as that makes sense and works well.
The $name
part doesn't work because when you select All
it will pass a regexp of all individual instances through the filter rather than passing an empty string. So in my case it means passing several thousand instance names through which seriously impacts Loki and it also means that any log or lifecycle even which doesn't have a name
field set won't work.
Also name
in a Loki entry doesn't necessarily mean instance name, if the event is network or storage related, it may refer to a network or storage pool. So I think we need to stay away from filtering based on instances for now.
The
$name
part doesn't work because when you selectAll
it will pass a regexp of all individual instances through the filter rather than passing an empty string. So in my case it means passing several thousand instance names through which seriously impacts Loki and it also means that any log or lifecycle even which doesn't have aname
field set won't work.
Oops, I had hoped it would do something sensible like empty string or ".*"
.
However, apparently there is a "custom all value", which I found via here and here.
Also
name
in a Loki entry doesn't necessarily mean instance name, if the event is network or storage related, it may refer to a network or storage pool. So I think we need to stay away from filtering based on instances for now.
OK, fair enough - although if "All" did match .*
then it would be OK.
Required information
Issue description
This is an issue with the grafana dashboard, https://grafana.com/grafana/dashboards/19727-incus/
Metrics display is working fine, but the loki panels at the bottom are empty. This is because the LogQL queries are wrong.
They have
{app="incus",type="lifecycle",instance="$job"}
for the first panel, and{app="incus",type="logging",instance="$job"}
for the second.However the "instance" label in logs don't contain the job name or the container name. Also, the
$job
variable in Grafana is the prometheus scrape job name, and has nothing to do with loki logs.Here are some example logs:
Notice how some logs have the name of the container as a label (
name="netbox4"
), but some other logs relating to this container don't. They may have it buried in the logfmt data though, e.g.context-instance="nfsen"
orsource="/1.0/instances/nfsen"
. If you filter logs by container, you still want to see those logs.I propose that at simplest, the queries need to change to:
The vertical bar inside the regexp is because the "name" and "project" labels may be missing (even for logs specific to one container), so we must allow through lines where this label is missing.
However, that will also show logs for other containers, when those logs have no name or project label. A bit of additional filtering can ensure that the container name appears somewhere in the log line:
This now works as expected:
However, if one container name is a prefix of another container name, or two projects have containers with the same name, it may show some logs for another container. A more sophisticated filter is possible:
This assumes that every log relating to container X either has label
name="X"
or the log message containscontext_instance="X"
. (Note that hyphens in logfmt attributes are converted to underscores, so that they become valid LogQL label names)This is true for the logs shown above. In fact, in these examples the lifecycle logs all have
name="X",project="Y"
and the event logs havecontext_instance="X",context_project="Y"
, so the queries can simplify to:I've tried this and it works for me. However I'm not sure if those conditions are true in general for all possible logs from incus. It could be argued it's hard-coding too much info about the log attributes.
Steps to reproduce
incus config set loki.api.url=http://loki.example.net:3100
Cross-reference
Issue appears to be inherited from lxd dashboard, raised previously: https://github.com/canonical/lxd/issues/13165