Open lmolkova opened 2 weeks ago
cc @open-telemetry/semconv-security-approvers
Based on the SemConv SIG discussion 6/24
user.name
is a login/human-readable name, e.g. root
or user login, user.id
is something else - e.g. an identifier the system uses internally. Action items:
I have some thoughts on what a user should be in semantic-conventions, and how it relates to identities.
A "user" (or multiple types of user resources) should handle these use cases.
“John Smith”
logs in to machine “laptop123” with user account “jsmith”
_, connects to a kube pod “podabc” with user “root”
_ inside the container, and runs a command. The log event for this should have the three different users and the context that each user exists within. There shouldn’t be conflicts that prevent writing all the info into the Otel event.Within IAM systems, a "user" is a type of identity. There are other types of identities such as Role, service account, or user group.
IAM users or managed user accounts are objects within the IAM system. Federated users are users that have existing, external user identities that are connected to the IAM system.
There are some differences in how users are implemented in different IAM systems. In AWS IAM, there is not a traditional "machine user". Instead, roles are typically attached to machine resources. It is possible to create a user that will be used by a workload. In GCP, there are service accounts, which act as machine users.
There are also Customer Identity and Access Management (CIAM) systems, such as AWS Incognito. I'm not sure if there's any difference with CIAM that would impact telemetry.
A user is not a role, or the credentials that identity the user, and care should be taken to not confuse them.
A 'user' can be considered two different concepts within OTel; user objects and attributes on non-user objects.
Within IAM providers, "user" is a type of identity.
User objects would probably be best implemented as OTel entities rather than resources since they usually have mutable attributes.
There are many existing concepts that are not themselves users, but that have user attributes. Some examples are git commits and OS files/processes. These are not users, but they do have existing user attributes. Git commits have a user name, email, and signing key. Posix files have a user ID and name. A JWT is a token that can carry information on a user, but it's not a user itself.
With embedded attributes, it should be possible to add attributes from the user
attribute registry into resources/entities that have the existing concepts of user attributes.
Right now the attribute registry descriptions can be rigid, and might not fit the existing usage in different concepts. For example, git user.name is the full name, while Linux file user.name is the short account name. So that's something that needs be be considered/handled.
The different cloud IAM providers have different implementations but generally follow the same high-level concepts. Should users/identity resources be generic, with a known mapping for each IAM provider, or separate resources for each IAM provider, and non-IAM users?
e.g. Is there a single "human user" type OR "AWS IAM user", "GCP managed user account", "systemd service user", etc types.
Users in the different IAM systems have different attributes, so I think they might need to be separate resources.
Should we have all attributes for user, or only a select "most-important" set? Active Directory has about 60 attributes for human users, and since there are many other identity providers, the set could get very large if we try to have all attributes for all IAM providers. It might be better to include the IAM ID/reference, so other attributes can be retrieved from it directly, as well as a smaller set of the most-used or "interesting" attributes.
OS user accounts often represent human users, but they can also be services or machine accounts. Should user accounts be classified as human or machine users, or does it matter at all?
OS user accounts and IAM system users might be different enough that it wouldn't make sense to try to unify them. Maybe OS user and IAM user should have different concepts in OTel.
In IAM systems, "user" is just one part of the data model. It might be better to design the larger "identity" data model, rather than do user first, and try to work identity around it later.
Context
Related to https://github.com/open-telemetry/semantic-conventions/issues/1142
We're considering adding
user.id/user.name
underdb
namespace and not sure which one (neither/both?) to use. We'd like to stay consistent with rootuser
namespace to leverage embedding mechanism (once it's implemented).When connecting to the database/messaging system/cloud service/etc it's common to talk about the identity rather than user:
E.g. Azure has several kinds of identities (human, and multiple kinds of machine identities) AWS also has multiple - users, groups, roles.
Problem
The concept of
user
in the semconv seem to map to the human identity only. It does not seem to apply to a generic 'identity' used to access a resource.Even within a 'human identity', it's not clear how to use it:
So, I think we need to decide and document:
user
namespace describe? OS user? Human identity? Any identity?