When someone should use user.name/user.id ?

lmolkova commented 2 weeks ago

Context

We're considering adding user.id/user.name under db namespace and not sure which one (neither/both?) to use. We'd like to stay consistent with root user namespace to leverage embedding mechanism (once it's implemented).

When connecting to the database/messaging system/cloud service/etc it's common to talk about the identity rather than user:

E.g. Azure has several kinds of identities (human, and multiple kinds of machine identities) AWS also has multiple - users, groups, roles.

Problem

The concept of user in the semconv seem to map to the human identity only. It does not seem to apply to a generic 'identity' used to access a resource.

Even within a 'human identity', it's not clear how to use it:

is name unique within the system?
if so, why there is an id? How are they different?
the login terminology does not seem to apply in many cases (authentication could be a broader term). I believe it's related to https://github.com/open-telemetry/semantic-conventions/pull/1146

So, I think we need to decide and document:

what does the user namespace describe? OS user? Human identity? Any identity?
what's the appropriate namespace based on this definition?
what are the right attributes to describe this thing?

lmolkova commented 2 weeks ago

cc @open-telemetry/semconv-security-approvers

lmolkova commented 1 week ago

Based on the SemConv SIG discussion 6/24

user.name is a login/human-readable name, e.g. root or user login, user.id is something else - e.g. an identifier the system uses internally.
user namespace is used for end-user (e.g. website user) or the OS user.
it does not seem to be able to describe generic identity. Also there are multiple identities that act at the same time (end-user, OS user service runs with, client identities used to access resources)

Action items:

@open-telemetry/semconv-security-approvers will discuss

mjwolf commented 1 week ago

I have some thoughts on what a user should be in semantic-conventions, and how it relates to identities.

Use Cases to Handle

A "user" (or multiple types of user resources) should handle these use cases.

Represent existing well-known usages of 'user' attributes.
- many existing concepts/objects have "user" attributes, OTel should be able to generate telemetry with these attributes.
Have a trackable scope/chain for each representation of "user".
- For example, the human user “John Smith” logs in to machine “laptop123” with user account “jsmith”_, connects to a kube pod “podabc” with user “root”_ inside the container, and runs a command. The log event for this should have the three different users and the context that each user exists within. There shouldn’t be conflicts that prevent writing all the info into the Otel event.
- The Entities WG will work on a way to define relationships between entities. This could be used, if the users are entities.
Accurately handle an IAM system's "user". See below section
Handle OS user accounts.
- OS user accounts have a different data model, scope, and functionality than IAM systems, so it might be best to handle them separately from IAM user.

What is an IAM user?

Within IAM systems, a "user" is a type of identity. There are other types of identities such as Role, service account, or user group.

IAM users or managed user accounts are objects within the IAM system. Federated users are users that have existing, external user identities that are connected to the IAM system.

There are some differences in how users are implemented in different IAM systems. In AWS IAM, there is not a traditional "machine user". Instead, roles are typically attached to machine resources. It is possible to create a user that will be used by a workload. In GCP, there are service accounts, which act as machine users.

There are also Customer Identity and Access Management (CIAM) systems, such as AWS Incognito. I'm not sure if there's any difference with CIAM that would impact telemetry.

A user is not a role, or the credentials that identity the user, and care should be taken to not confuse them.

Two concepts of 'user'

A 'user' can be considered two different concepts within OTel; user objects and attributes on non-user objects.

User Objects

Within IAM providers, "user" is a type of identity.

User objects would probably be best implemented as OTel entities rather than resources since they usually have mutable attributes.

Attributes on non-user objects.

There are many existing concepts that are not themselves users, but that have user attributes. Some examples are git commits and OS files/processes. These are not users, but they do have existing user attributes. Git commits have a user name, email, and signing key. Posix files have a user ID and name. A JWT is a token that can carry information on a user, but it's not a user itself.

With embedded attributes, it should be possible to add attributes from the user attribute registry into resources/entities that have the existing concepts of user attributes.

Right now the attribute registry descriptions can be rigid, and might not fit the existing usage in different concepts. For example, git user.name is the full name, while Linux file user.name is the short account name. So that's something that needs be be considered/handled.

Questions

Generic or IAM-specific resources?

The different cloud IAM providers have different implementations but generally follow the same high-level concepts. Should users/identity resources be generic, with a known mapping for each IAM provider, or separate resources for each IAM provider, and non-IAM users?

e.g. Is there a single "human user" type OR "AWS IAM user", "GCP managed user account", "systemd service user", etc types.

Users in the different IAM systems have different attributes, so I think they might need to be separate resources.

What attributes to add to the user registry?

Should we have all attributes for user, or only a select "most-important" set? Active Directory has about 60 attributes for human users, and since there are many other identity providers, the set could get very large if we try to have all attributes for all IAM providers. It might be better to include the IAM ID/reference, so other attributes can be retrieved from it directly, as well as a smaller set of the most-used or "interesting" attributes.

What are OS users accounts?

OS user accounts often represent human users, but they can also be services or machine accounts. Should user accounts be classified as human or machine users, or does it matter at all?

OS user accounts and IAM system users might be different enough that it wouldn't make sense to try to unify them. Maybe OS user and IAM user should have different concepts in OTel.

Should "Identity" be worked on first?

In IAM systems, "user" is just one part of the data model. It might be better to design the larger "identity" data model, rather than do user first, and try to work identity around it later.

open-telemetry / semantic-conventions