Process semantic conventions are potentially unclear/inconsistent

joaopgrassi commented 1 year ago

The PR "Add container semantic conventions resource" (#39) introduced the attributes container.command, container.command_line and container.command_args.

These attributes in the PR had a "note" saying

If using embedded credentials or sensitive data, it is recommended to remove them to prevent potential leakage

This fact triggered some discussions in the PR about the requirement level of such fields. It was pointed out that the process semconv define the same attributes, which are conditionally required, so the PR was adapted to follow it.

Upon looking at both things, I think we may have a few problems in the process semconv.

I believe the concerns of such attributes having sensitive data is valid but in the process semconv file, nothing of the sort is mentioned
The process semconv define those attributes are "conditionally required" but the "condition" is very vague and confusing - I marked them with A and B to be able to reference here

A) Additional attribute requirements: At least one of the following sets of attributes is required:

    process.executable.name
    process.executable.path
    process.command
    process.command_line
    process.command_args

B) Between `process.command_args` and `process.command_line`, usually `process.command_args` 
should be preferred. On Windows and other systems where the native
format of process commands is a single string, `process.command_line` can additionally (or instead) be used.

A): It lists all attributes and say at least one of them is required. Not sure if it's just me but I don't think this "condition" is good enough. What instrumentations should record? The first one it has? Is the list there in order of "priority" or "importance"

B:) It then gives actual directions, but only for process.command_args and process.command_line. The rest is still left without any guidance. Should they be added? Should they not?

In one of the discussions, it was brought up that they should probably be Opt-in (due to the potential sensitive data in such attributes), and I think I agree with this.

It would solve both the "sensitive information" issue (as users need to opt-in into them and they should know when it's safe or not) AND solves also the confusing "conditionally required" guidelines we have today. Basically all being opt-in allows the USER to pick which one they think it's more important/relevant for them.

CC @jsuereth @jamesmoessis @lmolkova @marcsanmi

Oberon00 commented 1 year ago

A): It lists all attributes and say at least one of them is required. Not sure if it's just me but I don't think this "condition" is good enough.

I think it is good enough and exactly as intended. You overlooked some attributes: There are some it does not list (e.g. process.pid). The intention here is to ensure we have some data about the executable, at least the base name of which will be in any of the listed attributes (not accounting for exotic cases where you override argv[0])

Oberon00 commented 1 year ago

The rest is still left without any guidance. Should they be added? Should they not?

This guidance would be a nice to have IMHO. I don't think the current conventions are inconsistent.

joaopgrassi commented 1 year ago

The intention here is to ensure we have some data about the executable, at least the base name of which will be in any of the listed attributes

Yes, that is clear, but shouldn't we have some sort of deterministic guidance on which one to report? I feel the way it is today, each instrumentation can report a different attribute and it's not going to be great. Like the guidance I called "B" is an example of something predictable.

joaopgrassi commented 1 year ago

I don't think the current conventions are inconsistent.

Sure, this one is not maybe just lacking a bit more guidance. But if the PR I linked is merged, we will have two command, command_args (Ok under different namespaces but still..) attributes that model pretty much the same thing but will be reported/recorded in different ways, with different guidelines. Maybe these should even be split out and shared like the HTTP ones?

Oberon00 commented 1 year ago

We are lacking the prefix concept from ECS, unfortunately, which could be helpful here.

open-telemetry / semantic-conventions

Process semantic conventions are potentially unclear/inconsistent #129