open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
271 stars 174 forks source link

Revisit system network metrics attributes #308

Open dmitryax opened 1 year ago

dmitryax commented 1 year ago

Context

This issue is a follow-up to https://github.com/open-telemetry/semantic-conventions/pull/89/files#r1299769832. We need to take the opportunity of the attribute renaming required for https://github.com/open-telemetry/semantic-conventions/issues/51 and choose names that inspire full confidence. The goal is to open a discussion and reach consensus on attribute names that minimize the risk of future changes.

All Network Metrics Attributes (Old Name / Current Name):

We should revisit each of these attributes within the scope of this issue. If you have any ideas or suggestions, please comment. The following are the main concerns we should address:

protocol / network.protocol

We picked network.protocol because it's generic and can be applicable to other metrics. It's listed in https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#network-attributes. We should rename it to network.protocol.name then.

state / system.network.state

While it currently sounds like a state of the network, in the OTel collector host metrics receiver, it represents connection status (e.g., "CLOSED," "LISTEN," ESTABLISHED," etc.). I think we should rename it to network.connection.status and add it to https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#network-attributes.

joaopgrassi commented 1 year ago

The other thing that came up was about the system.cpu.logical_number. From @ChrsMark comment on #89:

Don't want to block the merge of this but shouldn't the system.cpu.logical_number be under the host. namespace since it's a resource attribute? So it would be host.cpu.logical_number or host.cpu.id similar to what we have for the host.id?

So if we revisit the attributes, we need to look at this one as well.

dmitryax commented 1 year ago

The other thing that came up was about the system.cpu.logical_number

Thanks. That should go to another cpu related issue

mx-psi commented 1 year ago

I am moving this to blocked since I want to have a clear picture of #394, since I think questions like https://github.com/open-telemetry/semantic-conventions/pull/330#discussion_r1355207538 may arise here

mx-psi commented 9 months ago

Moving back to Todo, we have treated #394 as a MUST, but it is still unclear and would help if it's resolved to make progress here cc @joaopgrassi

ChrsMark commented 1 month ago

Bumping this up. I see that most of the initial concerns should now be covered:

network.direction

It has been renamed to network.io.direction with values transmit/receive.

system.device

Still system.device.

network.protocol

It has been renamed to network.protocol.name.

system.network.state

Still system.network.state

From the above I only see the system.network.state as possibly pending to change to system.network.status as proposed by @dmitryax at https://github.com/open-telemetry/semantic-conventions/issues/308. @open-telemetry/semconv-system-approvers what do you think about this?

Otherwise we can consider it as completed.

ChrsMark commented 1 month ago

One question that pops up is if we should use the system.device attribute or just introduce a new one i.e. network.interface.name. This is also related to https://github.com/open-telemetry/semantic-conventions/pull/1427#discussion_r1781283325 and https://github.com/open-telemetry/semantic-conventions/pull/1408#discussion_r1770627540.

ChrsMark commented 3 weeks ago

We discussed about system.device attribute for network metrics in K8s SemConv SIG meeting, Oct 16th. 2 questions occurred:

1) should we use system.device for network metrics or we should introduce a new one for network interface specifically, i.e. network.interface.name. 2) Should the decided attribute be shared between hostmetrics receiver and kubeletstats receiver? Or for k8s we should have a k8s.* specific one?

/cc @open-telemetry/semconv-system-approvers @open-telemetry/semconv-k8s-approvers

ChrsMark commented 3 weeks ago

We discussed about system.device attribute for network metrics in K8s SemConv SIG meeting, Oct 16th. 2 questions occurred:

  1. should we use system.device for network metrics or we should introduce a new one for network interface specifically, i.e. network.interface.name.
  2. Should the decided attribute be shared between hostmetrics receiver and kubeletstats receiver? Or for k8s we should have a k8s.* specific one?

/cc @open-telemetry/semconv-system-approvers @open-telemetry/semconv-k8s-approvers

Systems WG discussed that in Oct 17th meeting. We agreed that a netowork.interface.* attribute is more appropriate for network metrics. I'm going to file a PR to apply this decision.

ChrsMark commented 2 days ago

Bumping up the remaining item:

From the above I only see the system.network.state as possibly pending to change to system.network.status as proposed by @dmitryax at https://github.com/open-telemetry/semantic-conventions/issues/308. @open-telemetry/semconv-system-approvers what do you think about this?

braydonk commented 2 days ago

I personally don't feel strongly enough to fight much for it, but given the choice would agree that status is a better name than state. Upon further thinking I actually like state better. See my next comment.

trask commented 2 days ago

it looks like we have some of both currently:

status

state

do you think there are specific times when it makes sense to use one vs the other? or could we make a recommendation on generally preferring one of them?

braydonk commented 2 days ago

If I were to bikeshed state vs status (to a probably unreasonable degree) I would probably think:

I don't know how well those semantics would hold up in all scenarios though.

In System Semconv, I am usually biased towards following whatever verbiage is used in common reference to whatever the metric/attribute refers to. So, contrary to my last comment, I may have changed my mind about network.state. When I call nestat -a, each line has this:

Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 localhost:xyz           0.0.0.0:*               LISTEN 

On Windows netstat says the same thing, and so does SysInternals TCPView.

For a similar reason process.status is probably best to be renamed to process.state, given that /proc/[pid]/status calls it that and the equivalent in Win32 APIs calls it ExecutionState.

I feel like the generic semantics between state and status that I mentioned above stands up pretty well on its own, but it's probably stronger to make sure that the names of metrics/attributes match the contextual verbiage used even when it violates those generic semantics.