open-telemetry / semantic-conventions

Defines standards for generating consistent, accessible telemetry across a variety of domains
Apache License 2.0
288 stars 176 forks source link

Add Support for CloudFoundry #622

Open KarstenSchnitter opened 11 months ago

KarstenSchnitter commented 11 months ago

CloudFoundry Support

CloudFoundry is an open-source application platform. It is used to deploy and run applications, mainly 12-factor apps. CloudFoundy emits observability data by different channels through its Loggregator subsystem. An entry point to metrics and logs can be found in its documentation here: https://docs.cloudfoundry.org/loggregator/data-sources.html. Recently, experimental support for metrics forwarding via the OpenTelemetry Collector has been added. This is similar to the Cloud Foundry Receiver provided by OpenTelemetry Collector Contrib. In both cases, basic container metrics for all applications are emitted. Currently, there is no standard for describing CloudFoundry resources in OpenTelemetry. At least a resource attribute convention should be added to the semantic conventions.

CloudFoundry Applications

CloudFoundry provides a hierarchical model for application separation: The biggest entities are organizations (orgs) that can contain multiple spaces to which the applications (apps) are deployed. Apps are containers, that can run in multiple instances. Each application container instance can contain multiple processes, jobs and side-cars. The full collection necessary to identify a particular application component therefore consist of:

There might be more potential attributes, but these 9 should be sufficient.

CloudFoundry System Components

Besides the application data, CloudFoundry can also emit observability signals from its system components. Those are organised by the CloudFoundry Bosh project. A CloudFoundry installation is divided into different deployments, that comprise virtual machine (vm) instance groups to which jobs are deployed. Each job executes a specific process supported by lifecycle actions, e.g. post-stop, pre-start, post-start. This is a very vm centric organization, to which the existing resource attributes can be applied. Still, a bosh convention for deployments and jobs can be useful. This is not part of this proposal.

CloudFoundry Loggregator

CloudFoundry uses an internal protobuf based schema for telemetry signals documented as loggregator-api. This documentation introduces two generic fields to be used for applications and system components:

Note, that the linked documentation does not contain all tags, that can be found on CloudFoundry installations. In addition to the 9 application centric attributes, these 2 attributes should be included in a CloudFoundry convention.

The CloudFoundry system components emit observability data on behalf of a deployed application. Examples are:

The entire list of these log types can be found in the Streaming Application Logs documentation. Here, the challenge is to correctly describe the combination of app and system component.

OpenTelemetry Use-Cases in CloudFoundry

There are several use-cases, that can benefit from a CloudFoundry convention:

  1. Application Developers who directly instrument their applications, e.g. using the OpenTelemetry Java Agent. They need to rely on the environment variables provided by CloudFoundry to comply with a CloudFoundry convention. An example for a possible implementation can be found in SAP's logging library using an SAP internal CloudFoundry convention here.
  2. Application Developers who stream logs and metrics using CloudFoundry's syslog drain feature. They need to rely on the metadata contained in the messages to comply with a CloudFoundy convention. The 9 fields mentioned above are provided but required a mapping of the syslog data to OpenTelemetry.
  3. CloudFoundry operators who want to analyze the entire logs and metrics stream of a CloudFoundry installation including all deployed apps. They rely on the CloudFoundry Loggregator APIs including the new experimental OpenTelemetry support. Here a convention can help the implementation and integration of OpenTelemetry within CloudFoundry, e.g. the cloudfoundry-receiver could be changed to adhere to a new convention here.
KarstenSchnitter commented 11 months ago

I provided #624 as a proposal to this issue. It is based on an SAP internal convention for the same use-case. SAP tries to bring the OpenTelemetry and CloudFoundry community together on the endeavour. Please view this PR as a first proposal for a possible convention.

KarstenSchnitter commented 11 months ago

SAP also has an internal convention for Bosh workloads to address the system components, which I declared to be out-of-scope. Let me know, if and how this can be discussed. For now I suggest, to focus on the proposed scope.