open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.1k stars 2.38k forks source link

[pkg/ottl] Support for extracting OS attributes from UserAgent #35458

Open rogercoll opened 1 month ago

rogercoll commented 1 month ago

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

UserAgent semantic convention attributes can be extracted using the OTTL UserAgent function: https://github.com/pchila/opentelemetry-collector-contrib/tree/7da12e47eb9cf719aa593f9935bce9ba72844703/pkg/ottl/ottlfuncs#useragent (implemented in https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/34172)

The current extracted attributes are user_agent.name, user_agent.version and user_agent.original. But more information can be extracted from the user_agent.original string, like the OS related information.

Semantic conventions proposal: https://github.com/open-telemetry/semantic-conventions/issues/1433 Current Elastic ECS user_agent OS attributes: https://www.elastic.co/guide/en/ecs/current/ecs-user_agent.html#_field_reuse_30

Describe the solution you'd like

Extract additional fields from the user_agent:

Describe alternatives you've considered

No response

Additional context

This functionally would be very helpful for logs/metrics analytics, for example, a Nginx Ingress Controller log record contains the user-agent, this function could be configured in the collector to extract the OS information from all Nginx logs. Dashboards and alerts can be built over this information; OS with most errors? Which are the most common OS versions? etc.

github-actions[bot] commented 1 month ago

Pinging code owners:

ioandr commented 1 month ago

Hi @rogercoll, I took a quick look at this.

It looks like the UA parser provides a function to parse Os info from a user agent string.

In https://github.com/ua-parser/uap-core/blob/master/tests/test_os.yaml I found various user agent strings; however all expected test results consist of family, major, minor, patch and patch_minor and not type, name, version, build_id and description. So there is no 1:1 mapping between the two.

As a first iteration we could map:

Maybe we could also set user_agent.os.type by performing a lookup based on Os.family. e.g., Android -> Linux, WatchOS -> iOS, etc.

What about the rest of the fields you proposed though?

rogercoll commented 1 month ago

@ioandr Thanks for taking a look into this. Based on your research, there are three attributes which we cannot map 1:1 with the UA package parser function. I would purpose the following:

Although I would not make the previous a blocker, if is not clear/feasible their extractions, I would start with the 1:1 mapping with the UA package.

ioandr commented 1 month ago

Thanks for the follow-up @rogercoll, I will take a stab on this and open a PR shortly.

ioandr commented 1 month ago

Hi @rogercoll I opened a PR that adds name and version as discussed above. I also updated existing test cases as needed.

For the time being I didn't add the extra fields for the reasons below:

  1. type: I couldn't find an exhaustive, trustworthy mapping to go from OS family to OS type. Let's tackle this in the next iteration
  2. build_id: I am not sure mapping this to patch_minor does not look accurate after searching on the internet. Build ID is mostly common for Windows (e.g. 22621) and MacOS (e.g., 20B29)
  3. description: it seems that the UA parser does not provide a function to return the "original OS string". This probably requires some regex matching which might be tricky to get right for all user agent strings

Other than these, please let me know if I need to update any OTEL collector documentation, I couldn't find any relevant places other than the Semver documentation:

https://opentelemetry.io/docs/specs/semconv/attributes-registry/os/