Open lmolkova opened 5 months ago
cc @open-telemetry/semconv-system-approvers @open-telemetry/semconv-container-approvers
I can try to address some of the problems.
It's not clear if we'd expect to have OS-specific metrics in each of the namespaces
I don't think we've written the expectation anywhere. There was a discussion about this in a PR from almost a year ago that was adding a platform specific process metric, where it seemed like process.windows
was the direction to take for this one metric: https://github.com/open-telemetry/semantic-conventions/pull/142#discussion_r1245574693
I'd be surprised to see more platform specific metrics in the process namespace but I still think the name for the metric should be process.windows.handles
; there is simply no elegant way to name and describe the metric in a cross-platform way that isn't needlessly confusing; handles
are a completely Windows-specific concept.
We end up defining similar attributes in each namespace
That's true, but in the case of cpu.state
the biggest challenge there was that the attributes defined in each namespace were similar, but had different expected values. When used on process.cpu.time
, cpu.state
only has 3 expected values, but it has more different ones when used in other contexts. So the different *.cpu.state
attributes were similar but they benefit from having their minor contextual differences be separated by being different named attributes, rather than trying to force it all into one shared cpu.state
attribute. (I've been out for a while, but this was where the discussion was left last time I was part of it. It's possible we've worked past this already, so apologies if that's the case)
Do we need separate
process
andsystem
metrics?
Right now there are three process
metrics that don't have system
equivalents:
process.context_switches
process.thread.count
process.open_file_descriptor.count
Those are metrics that I would probably only expect to see on a process resource.
However if I'm understanding correctly, the preferred state would be instead of system
and process
metric namespaces, these metrics essentially not be namespaced? So instead of system.cpu.time
and process.cpu.time
there be just cpu.time
? I could probably see that working, though we'd probably run into the same problem with merging the cpu.state
attribute; how do we handle detailing the minute differences for when cpu.time
is reported for a host
resource vs for a process
resource? I don't know which way is better; I've personally always been on the side of keeping process
and system
stuff completely separate and just deal with the repetition, but I sense that's not a very popular opinion.
Isn't system.cpu.time is a sum of all process.cpu.time on the same machine?
That's true afaik, but I think the reason there is this separation in the first place is because there's the host
resource where you get system-wide metrics reported, and the process
resource where you get metrics for each individual process. The usage of both of those is different, and the ability to specify per-process metrics is still a very common and important use case.
Thank you for the context!
First, I'm not trying to push into any specific direction. I'd be happy with any outcome that would minimize confusion and duplication.
If we look into process/system metrics:
process
namespace)
jvm.*
, go
, etc)Would we be able to unify them?
system.cpu.time
would be across all processes on the machine and would have process.pid
as a attribute.It should be fine to start using resource attributes as attributes on metrics - today we just imply them, but still, without pid attribute (or service,instance.id
), process metrics are not useful.
Would we still need to provide a cheaper cross-machine cpu time metric convenience in case someone doesn't want to have per-process metrics?
Maybe. We can do it by making process.pid
disable-able and reusing the same system.cpu.time
metric.
There would be metrics that won't have pid as an attribute, e.g number of active|started|stopped processes - they'd happily stay under system
namespace without pid.
Some metrics could have required process.pid attribute if they don't make sense across the machine.
What problems would it solve:
system.linux.*
and process.linux.*
We can just use linux
and windows
for OS-specific metricsI'm sure there could be some problems with this approach too. Happy to hear any feedback.
I don't know which way is better; I've personally always been on the side of keeping process and system stuff completely separate and just deal with the repetition, but I sense that's not a very popular opinion.
That's true afaik, but I think the reason there is this separation in the first place is because there's the host resource where you get system-wide metrics reported, and the process resource where you get metrics for each individual process. The usage of both of those is different, and the ability to specify per-process metrics is still a very common and important use case.
If I remember correctly that's the main reasoning that the System Metrics WG has concluded to so far.
system.cpu.time would be across all processes on the machine and would have process.pid as a attribute.
An equivalent Node Vs Pod example would imply to report sth like k8s.cpu.time
with k8s.pod.uid
attribute 🤔 ? That said, I believe that having system
and process
namespaces is based on the fact that they are different entities and users are just fine with that.
Maybe. We can do it by making process.pid disable-able and reusing the same system.cpu.time metric.
What would happen if users decide to switch from one option to the other? It's still not clear to me how the options would look like but I guess that could end up being more complicated for users compared to the current distinction?
Also what would be the carnality impact and query load impact from this?
Both are reported from inside the system and are based on OS measurements. this suggests that the component that records them should probably be the same
I disagree on this point. They are both reported from inside the system, but some are about the entire system itself and some are about each individual process. They are describing distinct entities.
some metrics are per-process (they are in the process namespace), arguably, they should not come from within the process - we have runtime-specific metrics for it (jvm.*, go, etc)
Might be misunderstanding this one, but there are a few process-specific metrics that do not apply to runtimes. I also think it's untenable to create semantic conventions for every possible runtime; there should be a generic system-level option. There's lots of precedence for monitoring processes directly from the system as it can be a good fallback.
system.cpu.time would be across all processes on the machine and would have process.pid as a attribute.
Is this to say that these metrics would all be reported under the host
resource and each have a process.pid
attribute to separate the time series? Unfortunately I don't think this would turn out well. There are quite a few resource attributes for a process, having to spread those out across every single per-process metric would be extremely inefficient compared to having one process resource and recording all the metrics under it.
as a user I don't need to wonder which metrics I should collect: process, system, both?
I would say this is actually a very important decision that we should expect users to make. system
metrics are generally 1 set of metrics for 1 resource (the host
) which means that they have no growing cardinality, whereas process
metrics are 1 set of metrics for N resources, where N is the number of processes on the system. The cardinality is very large and unpredictable. It might be confusing for users; these metrics map very directly with the actual information coming from the system, and that information is on its own hard to understand. But given the cardinality implications it's important that the user can easily understand their options. Adding to that, I think it's much more clear to a user the idea of disabling all metrics under a process
namespace to not collector per process metrics, rather than disabling a particular set of process
attributes that control cardinality.
the boundary between runtime/process/system will be more clear (runtime - inside the process, system from outside the process)
I think there is a third boundary there.
And I think the current semantic conventions maps to these boundaries pretty directly.
I think any direction forward should absolutely keep process
as a resource in its own right. It makes sense in its own resource separate from host
. However I could see a path forward where certain metrics that are in both are merged. For example, taking these metrics that are in both namespaces:
memory.usage
disk.io
network.io
cpu.time
cpu.utilization
paging.faults
These could be moved into their own either shared namespace or individual namespaces, and then have different meanings when reported under a host
resource vs under a process
resource. I'm not sure how easy it would be to use semantic convention tooling to generate nice enough documentation that would make it clear what those metrics mean when reported under different resources, but assuming that sort of thing was possible I could see a future where that works out.
Re: resource metrics are reported under
It's not documented in the semantic conventions - at least I don't see any mention of it on the process
metrics, and it's not clear if this applies to system metrics Resource attributes related to a host, SHOULD be reported under the host.* namespace.
We should be able to document the attributes applicable to each metric regardless of the unification. If they are not specified, someone could report process metric without adding any resource attributes, or with adding some other non-process ones.
By documenting specific attributes we'd also make the cardinality of the metric clear.
So, if we explicitly document the attributes we expect to have on this metrics, we could also explain that it does not matter how these attributes are populated (as resource attributes or regular metrics attributes).
With this the attachment to resource no longer applies.
E.g.
process.cpu.time
should have at least three attributes:
cpu.state|mode
process.pid
host.id
now system is the same metric but without the pid
system.cpu.time
cpu.state|mode
host.id
Re: boundaries and who measures things
I don't understand the boundary between runtime and process from semantic convention standpoint.
E.g. if I'm designing .NET runtime metrics, should I report process.cpu.time
or dotnet.cpu.time
? The answer we have from java is the latter (since cross-runtime unification is almost impossible).
Or maybe both so that someone could build cross-language dashboard and alerts?
Could/should I report them from the same instrumentation inside the process? Then the resource they are attached to is a random things users decided to configure which may or may not include host, process, etc.
If I report process
metrics from inside the process, do I report just this process or all processes in the system? What if I use collector?
user experience
The current path to success seem to look like:
To decide what your need, you have to
I agree that some of this is inevitable, but as I user I would not like the lack of clarity and no simple default experience I can start with.
It's not documented in the semantic conventions - at least I don't see any mention of it on the process metrics, and it's not clear if this applies to system metrics.
It definitely should be. The intention is definitely for all metrics in the process metrics document to be expected as being reported under this process resource. I can make that change, assuming there is a way to do that with the semconv tooling.
I don't understand the boundary between runtime and process from semantic convention standpoint.
In my eyes they are completely different, but given what we have actually written today I can see it's not very clear.
The resource attributes and metrics in the process
namespace are intended to map directly to the concepts of a process in an operating system. These metrics aren't intended to be reported by a process itself. Instrumentation that uses these metrics should realistically be system-level instrumentation that uses the OS's facilities for reading information about all processes on the system. As such, these metrics in the process
namespace are designed exclusively for being reported under a process
resource, which contains other useful information about that process on the OS.
This much isn't clear from current docs generated from semconv yaml, I don't know if it used to be with the handwritten docs. Is there a way to make this more clear using tooling in a way we aren't currently, or should I write something manually somewhere to make it more clear?
User experience
I think with above clarifications that are currently missing from semconv docs, the experience is much more straightforward:
I don't see how container and runtime metrics are intertwined with these decisions. They seem separate. If the user is using particular runtimes or using containers, then they should use special instrumentation for those. but the instrumentation for system
and process
metrics are generally OS-level, like the hostmetricsreceiver
in the OTel Collector.
On the semconv yaml definitions and tooling:
You can just list the attributes that should be reported on metric. There is no way to say that metric should be reported under specific resource, and it would not be precise enough anyway.
I.e. if someone specified process.executable.name
and process.owner
, it would not identify process uniquely.
To build dashboard we'd need at least process.pid
there (+ executable name and maybe other things). But having all process attributes is not necessary either.
There is no separation between resource vs regular attributes on semantic conventions. Also if someone wants to report the metric and add attributes explicitly on each measurement instead of using resources, this would be totally fine.
I think having those specified would be a great improvement.
The resource attributes and metrics in the process namespace are intended to map directly to the concepts of a process in an operating system. These metrics aren't intended to be reported by a process itself.
I think this should also be mentioned in semconv - that OTel language SIGs are not expected to have process/system instrumentation libraries.
But we have plenty of them already:
As instrumentation libraries they leave it up to the user to configure resources.
Tagging @open-telemetry/javascript-maintainers @open-telemetry/dotnet-contrib-maintainers @open-telemetry/ go-maintainers in case they have any thoughts or feedback wrt process vs runtime metrics and the future of process instrumentation.
User experience
What I'm offering seems similar:
by default I'd prefer to have
If I want more:
process.*
metrics (or enable pid attribute on all relevant system metrics)process.*
metrics (or enable pid attribute on all relevant system metrics). This is a good case to keep process and system metrics separate because I might not want all processes and then I can't aggregate. Still it could be possible to report "other" pid as a bulk sum of all untracked processes.So you start from a safe (hopefully documented) default and you add details.
The process vs runtime still concerns me - we're doing cpu/memory duplication by design and forcing users to build different alerts/dashboards for each runtime whether they care about differences or not.
I'd prefer the default to be:
Container vs system metrics
They have a certain level of duplication (cpu, memory), the key difference is where you observe these metrics from. As a user I might be able to record both and effectively I'd need to pick one or another to build my alerts/dashboards/etc on.
But we have plenty of them already
Thanks for this, I was definitely incorrect when I said: These metrics aren't intended to be reported by a process itself
. There's clear precedent for it that I wasn't aware of, so my previous mention can be disregarded.
I guess this probably works out most of the time, cause the metrics are reported under whatever resource is instrumented, so the metrics are probably typically reported under some manner of application resource that makes it obvious what those metrics are for even though they aren't particularly under a process
resource. So I backpedal my previous statement; it makes sense that these metrics could be reported by a process instrumenting itself to read its own stats from the OS.
There is still a difference between these process.*
metrics and the associated runtime metrics, that being where the info is read from. In each of the cases from the SDKs above, the information for the process
metrics comes from the OS directly. In the case of jvm
metrics, they are read through JMX (at least according to these docs, hopefully that's not off base). So even though there are some duplicate metrics, because the different runtimes usually provide some ways to get the same information as you could from the OS, they are still distinct from one another due to their source.
So I think they are different, but there probably is still a way for there to just be a memory
metric namespace, and the JVM or Process instrumentation could use the same memory
metric definitions from the shared namespace. I think the challenge there is when there might be metrics with similar names, but that mean different things when used in different contexts. Taking an example like memory.usage
:
heap
usage statsIn this scenario, the meaning of memory.usage
is different depending on the reporting source. I think this is the type of thing that would show up repeatedly in the scenario where we are trying to unify these metrics. If we are okay with finding a nice way to document these based on the reporting source then it could work, but we already have a separation based on reporting source:
system
-> The metric is data about the root system
process
-> The metric is data about the process
container
-> The metric is data about a container
runtimes -> The metric is data reported by a runtime
Given that these namespaces probably still need to continue to exist due to having certain metrics that won't be shared, it is probably easier in the long run to keep duplicate named metrics in each namespace because in some scenarios they do mean something quite different based on the context that particular metric point is reported for.
There is no way to say that metric should be reported under specific resource
That's kind of disappointing actually. I think I understand why, but it is too bad for the process
metrics, which are sort of designed to be reported under a process resource, like how they are currently reported by the collector's hostmetricsreceiver
. That means it's definitely a shortcoming in the current semconv definition of those metrics though, as they don't make mention of any of those process attributes due to being designed for reporting under a process
resource. We (system semconv group) will have to find some way to add these attributes to the metric, but I guess make their requirement conditional on the presence of a particular parent resource?
I notice in the instrumentation examples you provided they don't add any identifying attribute like process.pid
even though the parent resource isn't a process
, but they are still effectively identified provided the manual instrumentation has some resource that the user configured themselves. So given that, maybe the attributes are just added all as optional. :thinking:
The process vs runtime still concerns me - we're doing cpu/memory duplication by design and forcing users to build different alerts/dashboards for each runtime whether they care about differences or not.
The example I gave above on the difference between memory usage reported by the Go runtime vs by the OS for the process is one counterexample to support these things remaining separate. The duplication in names doesn't always imply that they are duplicating the exact same value. Sometimes it does; on Linux, a container runtime reading metrics from cgroup
stats is usually roughly the same number as the stats you might get from procfs
for example.
Unfortunately I don't have enough expertise in all the runtimes and their metrics to say if there are more counterexamples. If this counterexample with memory usage in particular is the only one, or if there are very few, then maybe the unification would be fine and we deal with the prickly differences one by one.
For what it's worth, we discussed this in the System Semantic Conventions meeting today. We generally agreed we think it is still worth keeping the metrics in namespaces system
, process,
container`, and each respective runtimes due to:
I'd welcome additional feedback from the other @open-telemetry/semconv-system-approvers folks.
In this case, I think the namespace is key to easily identify similar metrics, but that have been computed differently because of their source. Even some signals have the same suffix (e.g *.cpu.time
), they might have different meaning depending on the source. The namespace should identify and explain those differences. For example, a container can be seen as one or multiple processes, but the key difference from "system" processes is the underlying technology that manages them. Containers rely on the cgroups, which offer a range of capabilities beyond those provided by traditional kernel process management (system). For example, cgroups containers CPU time is determined by dividing the cgroup's CPUShares by the total number of defined CPUShares on the system. As cpu_shares being specific to cgroups, when creating alerts/dashboards for container.cpu.time
, cgroup capabilities should be taken into account, differently than process.cpu.time
that is not aware of the same cpu resource limiting techniques. Also, as cgroups being a newer technology, container.cpu.time
could be reported as nanoseconds instead of the current seconds precision (different metric).
@open-telemetry/semconv-system-approvers is there any conclusion on this which would result in changing the existing model? Otherwise we can close this if there is no majority towards these changes.
We have multiple layers of metrics:
process
which reports OS level metrics per process as observed by the OS itselfsystem
metrics that report OS metrics from the OS perspectivecontainer
metrics that are reported by the container runtime about containerPlus we have attributes is all of these namespaces that have something in common:
Problems:
system.linux.memory.available
(https://github.com/open-telemetry/semantic-conventions/pull/1078), it's not clear if we'd expect to have OS-specific metrics in each of the namespaces (container.linux.memory.*
,system.linux.*
,process.linux.memory.*
) https://github.com/open-telemetry/semantic-conventions/pull/1078#discussion_r1638375208system.cpu.time
is a sum of allprocess.cpu.time
on the same machine?