open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
256 stars 165 forks source link

Add Enviroment Variables to process registry schema #672

Open mjwolf opened 8 months ago

mjwolf commented 8 months ago

Environment variables are an important part of process data models, and they should be added to the process registry schema.

process.env_vars was previously part of https://github.com/open-telemetry/semantic-conventions/pull/564, and was removed because there are some open discussion items that should be decided before being added to the schema.

Some of these questions to resolve are:

  1. Should env_vars be an object, with environment variable names as free-form leaf nodes, as suggested here?

Although this suggestion could have advantages, I'm not sure if it's possible. According to the Open Group standard, environment variable names names shall not contain the character '=' (ref), but have no other exclusions, so extended character sets, symbols, etc are valid as part of environment variable names. I'm not sure if this works with OTel key names.

One alternative to using freeform keys is to store environment variables as a string array, such as this

  1. Should filtering be required or recommended? (discussion)

Environment variables could contain sensitive information, such as API keys, and this information should be redacted to prevent security problems. It should be decided if filtering is required (using MUST) or recommended (using SHOULD).

trask commented 8 months ago
  1. Should env_vars be an object, with environment variable names as free-form leaf nodes, as suggested https://github.com/open-telemetry/semantic-conventions/pull/564#discussion_r1459047082?

I believe so, we have a few other examples like this, and semconv tooling already supports <key> in attribute names (see HTTP headers for example)

  1. Should filtering be required or recommended?

Probably the easiest is to make these attributes Opt-In (also can see HTTP headers for example)

svrnm commented 8 months ago

Thanks for bringing this discussion out of the PR, for completeness here is the part of the comment I made about having a similar format as the HTTP headers:

| `process.environment_variable.<key>` | string | Process environment variables, `<key>` being the environment variable name, the value being the environment variable value. | `proccess.environment_variable.PATH="/usr/local/bin;/usr/bin"`; `process.environment_variable.USER="ubuntu"` |

Although this suggestion could have advantages, I'm not sure if it's possible. According to the Open Group standard, environment variable names names shall not contain the character '=' (ref), but have no other exclusions, so extended character sets, symbols, etc are valid as part of environment variable names. I'm not sure if this works with OTel key names.

Looking into the (https://pubs.opengroup.org/onlinepubs/000095399/basedefs/xbd_chap08.html) you quoted, it seems the following sentence has relevance as well:

These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to IEEE Std 1003.1-2001, the value shall be composed of characters from the portable character set (except NUL and as indicated below). There is no meaning associated with the order of strings in the environment. If more than one string in a process' environment has the same name, the consequences are undefined.

IEEE 1003.1 is POSIX, and the portable character set, is a set of 103 characters, see also this source.

So this shouldn't be a concern, additionally attribute naming specification states that every name MUST be a valid Unicode sequence., which should help even if any system operates outside of POSIX/portable character set.

  1. Should filtering be required or recommended? Probably the easiest is to make these attributes Opt-In (also can see HTTP headers for example)

+1 for this, by having that format process.environment_variables.<key> you can select the environment variables that you want to have (and don't want to have), without the need to sanitize a string holding all of them. So (some) filtering is implicit.

joaopgrassi commented 7 months ago

FYI @open-telemetry/semconv-system-approvers