open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
220 stars 141 forks source link

When someone should use user.name/user.id ? #1172

Open lmolkova opened 2 weeks ago

lmolkova commented 2 weeks ago

Context

Related to https://github.com/open-telemetry/semantic-conventions/issues/1142

We're considering adding user.id/user.name under db namespace and not sure which one (neither/both?) to use. We'd like to stay consistent with root user namespace to leverage embedding mechanism (once it's implemented).

When connecting to the database/messaging system/cloud service/etc it's common to talk about the identity rather than user:

E.g. Azure has several kinds of identities (human, and multiple kinds of machine identities) AWS also has multiple - users, groups, roles.

Problem

The concept of user in the semconv seem to map to the human identity only. It does not seem to apply to a generic 'identity' used to access a resource.

Even within a 'human identity', it's not clear how to use it:

So, I think we need to decide and document:

lmolkova commented 2 weeks ago

cc @open-telemetry/semconv-security-approvers

lmolkova commented 1 week ago

Based on the SemConv SIG discussion 6/24

Action items:

mjwolf commented 1 week ago

I have some thoughts on what a user should be in semantic-conventions, and how it relates to identities.

Use Cases to Handle

A "user" (or multiple types of user resources) should handle these use cases.

What is an IAM user?

Within IAM systems, a "user" is a type of identity. There are other types of identities such as Role, service account, or user group.

IAM users or managed user accounts are objects within the IAM system. Federated users are users that have existing, external user identities that are connected to the IAM system.

There are some differences in how users are implemented in different IAM systems. In AWS IAM, there is not a traditional "machine user". Instead, roles are typically attached to machine resources. It is possible to create a user that will be used by a workload. In GCP, there are service accounts, which act as machine users.

There are also Customer Identity and Access Management (CIAM) systems, such as AWS Incognito. I'm not sure if there's any difference with CIAM that would impact telemetry.

A user is not a role, or the credentials that identity the user, and care should be taken to not confuse them.

Two concepts of 'user'

A 'user' can be considered two different concepts within OTel; user objects and attributes on non-user objects.

User Objects

Within IAM providers, "user" is a type of identity.

User objects would probably be best implemented as OTel entities rather than resources since they usually have mutable attributes.

Attributes on non-user objects.

There are many existing concepts that are not themselves users, but that have user attributes. Some examples are git commits and OS files/processes. These are not users, but they do have existing user attributes. Git commits have a user name, email, and signing key. Posix files have a user ID and name. A JWT is a token that can carry information on a user, but it's not a user itself.

With embedded attributes, it should be possible to add attributes from the user attribute registry into resources/entities that have the existing concepts of user attributes.

Right now the attribute registry descriptions can be rigid, and might not fit the existing usage in different concepts. For example, git user.name is the full name, while Linux file user.name is the short account name. So that's something that needs be be considered/handled.

Questions

Generic or IAM-specific resources?

The different cloud IAM providers have different implementations but generally follow the same high-level concepts. Should users/identity resources be generic, with a known mapping for each IAM provider, or separate resources for each IAM provider, and non-IAM users?

e.g. Is there a single "human user" type OR "AWS IAM user", "GCP managed user account", "systemd service user", etc types.

Users in the different IAM systems have different attributes, so I think they might need to be separate resources.

What attributes to add to the user registry?

Should we have all attributes for user, or only a select "most-important" set? Active Directory has about 60 attributes for human users, and since there are many other identity providers, the set could get very large if we try to have all attributes for all IAM providers. It might be better to include the IAM ID/reference, so other attributes can be retrieved from it directly, as well as a smaller set of the most-used or "interesting" attributes.

What are OS users accounts?

OS user accounts often represent human users, but they can also be services or machine accounts. Should user accounts be classified as human or machine users, or does it matter at all?

OS user accounts and IAM system users might be different enough that it wouldn't make sense to try to unify them. Maybe OS user and IAM user should have different concepts in OTel.

Should "Identity" be worked on first?

In IAM systems, "user" is just one part of the data model. It might be better to design the larger "identity" data model, rather than do user first, and try to work identity around it later.