Open braydonk opened 10 months ago
I think each attribute in the current process metrics schema is exhaustive:
direction
attributes have their respective in
and out
equivalentsprocess.paging.faults.type
, is there such thing other than a major
or minor
page fault?process.context_switches.type
, voluntary
and involuntary
on their own are grammatically exhaustive, and I can't imagine any other value for this.process.cpu.state
is the only one where it could be up in the air for some super niche edge case; I can't think of any CPU state other than the existing system
, user
, and wait
personally but would love to learn if there is somethingprocess.threads
-> process.thread.count
This was discussed in https://github.com/open-telemetry/semantic-conventions/pull/330/files#r1330094680
While I think it makes sense to merge with ECS where possible, I'm not sold on the rename to process.threads.count
for 2 reasons:
process.threads
namespace in the linked ECS docs all seem to give information about a single process thread "resource" so to speak. This metric is about the thread count of a process "resource", so I fear this metric being in a process.thread
namespace is confusing.count
iff they are a monotonic counter?Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.
process.context_switches
-> process.context_switch.count
This is related to #260 guidance; this is a monotonic counter, so process.context_switch.count
seems like a logical progression of this metric name to match with that.
process.network.io.direction
, transmit
-> send
This is mostly my own idea, I'm not married to it. However, I think send
is a better attribute value than transmit
for two reasons:
receive
process.cpu.utilization
, do we have a better way to indicate that this metric is intended to be computed from another?This metric is computed from process.cpu.time
and (insert name of CPU count metric here). Is there some better way to indicate within the semconv generation that this metric is indeed intended to be computed from other known values? At the moment it's only outlined in the description, and not from any actual semantic convention terminology.
Every metric in the process.threads namespace in the linked ECS docs all seem to give information about a single process thread "resource" so to speak. This metric is about the thread count of a process "resource", so I fear this metric being in a process.thread namespace is confusing.
I see the point @braydonk ! I agree that process.threads.count
(violates the namespace pluralization rule) or process.threads
is the accurate one in that case.
@AlexanderWert @trisch-me feel free to add your input here.
I also do agree that existing ECS schema for process.thread
is about thread itself and process.threads
is about process itself having multiple threads and therefore they are having different semantic meaning. process.thread
already exist in ECS and therefore we should be careful when introducing new naming conflicting with it.
Truthfully I'm not caught up with the plans for where we're merging with ECS and how we reconcile the differences, so I'd be happy to hear from the ECS stakeholders about how we should handle this.
We are currently actively adding new fields from ECS into Otel. Please also check this PR about consideration when adding new fields which might exist already in ECS
Discussed on January 18th System Semantic Conventions WG meeting, @braydonk to update after discussion in #330
@braydonk are we good to proceed on this issue or it's still blocked?
Yeah this should be good to proceed. Here's the current state of things:
process.threads
-> process.thread.count
Yes
It is an updowncounter
so this new name follows the naming rule, done in #330.
process.context_switches
-> process.context_switch.count
No
It is a counter
so it remains as is to follow the naming rule.
process.cpu.utilization
, do we have a better way to indicate that this metric is intended to be computed from another?
No
As long as I'm not mistaken, there isn't a tooling solution for this and the best way seems to be that this is called out in the description of the metric. Currently it sort of is: https://github.com/open-telemetry/semantic-conventions/blob/c316b1fb8747b8384b738806cb5068a9bf839f8c/model/metrics/process-metrics.yaml#L19-L21
Should process.cpu.state
be exhaustive?
No
The metric is being merged into a single cpu.state
metric, where the list of possible values will be wider than the 3 expected to be used in the process metrics. See #1026.
Should the other attributes in this comment be made exhaustive?
I'm tempted to say no since these metrics have now been moved to the registry and it makes less sense to be prescriptive even though these metrics will likely only have these values when used in process metrics.
network.io.direction
value transmit
-> send
I think I still agree with my reasoning in the comment above, but I'm starting to lean towards not rocking the boat and leaving that where it is. I think I'm splitting hairs too much with it and am fine to leave it as is.
Pursuant to #330 which should be merged soon, I am opening this issue to revisit the process metrics schema while we are making breaking changes anyway.
As of writing, the only changes to the process schema will be to namespace every attribute. There are a couple of changes that should be discussed among the Hostmetrics WG to improve the schema.
Following are the current points up for discussion:
process.threads
->process.thread.count
process.context_switches
->process.context_switch.count
process.network.io.direction
,transmit
->send
process.cpu.utilization
, do we have a better way to indicate that this metric is intended to be computed from another?Please comment if you have any additional points to discuss. I will comment my own opinion on each below so I don't bias the root issue post with my own opinions. If any of these need a larger discussion I will spin them into child issues.