Do we need a spec for console output format?

tigrannajaryan commented 1 year ago

Many (all?) language SDKs have a "console" exporter that allows outputting the produced telemetry to standard output. This is very useful for getting started, debugging, etc.

The spec does not define a format for this exporter and I think language maintainers usually come up with the own formats.

Should we define a format that all SDKs can implement uniformly?

We do define a file exporter format, but console likely has requirements that will make it different (i.e. human readability is likely a top priority).

Related to discussion about this in PHP repo: https://github.com/open-telemetry/opentelemetry-php/issues/1050

@open-telemetry/specs-approvers any thoughts?

yurishkuro commented 1 year ago

There two possible categories of output by console exporter, lossless data dump or a summary format like

exporting span id, trace id, name, attr count

tigrannajaryan commented 1 year ago

There two possible categories of output by console exporter, lossless data dump or a summary format like

exporting span id, trace id, name, attr count

Yes.

In Collector we have a human-readable, detailed (but probably ambiguous) text format: https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/loggingexporter

PHP SDK's console exporter which I was reviewing recently uses json, which appears to be non-ambiguous, but also is not OTLP/JSON.

It may be worth checking to see what other SDKs do.

svrnm commented 1 year ago

big +1 to this, as it would also significantly improve the docs, right now the outputs per language look very different:

Node.JS has a JSON-like representation

Node.JS output
**Spans:** ```json { "traceId": "3f1fe6256ea46d19ec3ca97b3409ad6d", "parentId": "f0b7b340dd6e08a7", "name": "middleware - query", "id": "41a27f331c7bfed3", "kind": 0, "timestamp": 1624982589722992, "duration": 417, "attributes": { "http.route": "/", "express.name": "query", "express.type": "middleware" }, "status": { "code": 0 }, "events": [] } { "traceId": "3f1fe6256ea46d19ec3ca97b3409ad6d", "parentId": "f0b7b340dd6e08a7", "name": "middleware - expressInit", "id": "e0ed537a699f652a", "kind": 0, "timestamp": 1624982589725778, "duration": 673, "attributes": { "http.route": "/", "express.name": "expressInit", "express.type": "middleware" }, "status": { code: 0 }, "events": [] } { "traceId": "3f1fe6256ea46d19ec3ca97b3409ad6d", "parentId": "f0b7b340dd6e08a7", "name": "request handler - /", "id": "8614a81e1847b7ef", "kind": 0, "timestamp": 1624982589726941, "duration": 21, "attributes": { "http.route": "/", "express.name": "/", "express.type": "request_handler" }, "status": { code: 0 }, "events": [] } { "traceId": "3f1fe6256ea46d19ec3ca97b3409ad6d", "parentId": undefined, "name": "GET /", "id": "f0b7b340dd6e08a7", "kind": 1, "timestamp": 1624982589720260, "duration": 11380, "attributes": { "http.url": "http://localhost:8080/", "http.host": "localhost:8080", "net.host.name": "localhost", "http.method": "GET", "http.route": "", "http.target": "/", "http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36", "http.flavor": "1.1", "net.transport": "ip_tcp", "net.host.ip": "::1", "net.host.port": 8080, "net.peer.ip": "::1", "net.peer.port": 61520, "http.status_code": 304, "http.status_text": "NOT MODIFIED" }, "status": { "code": 1 }, "events": [] } ``` **Metrics**: ```yaml { descriptor: { name: 'http.server.duration', type: 'HISTOGRAM', description: 'measures the duration of the inbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [ { attributes: [Object], startTime: [Array], endTime: [Array], value: [Object] } ] } { descriptor: { name: 'http.client.duration', type: 'HISTOGRAM', description: 'measures the duration of the outbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [] } { descriptor: { name: 'db.client.connections.usage', type: 'UP_DOWN_COUNTER', description: 'The number of connections that are currently in the state referenced by the attribute "state".', unit: '{connections}', valueType: 1 }, dataPointType: 3, dataPoints: [] } { descriptor: { name: 'http.server.duration', type: 'HISTOGRAM', description: 'measures the duration of the inbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [ { attributes: [Object], startTime: [Array], endTime: [Array], value: [Object] } ] } { descriptor: { name: 'http.client.duration', type: 'HISTOGRAM', description: 'measures the duration of the outbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [] } { descriptor: { name: 'db.client.connections.usage', type: 'UP_DOWN_COUNTER', description: 'The number of connections that are currently in the state referenced by the attribute "state".', unit: '{connections}', valueType: 1 }, dataPointType: 3, dataPoints: [] } { descriptor: { name: 'http.server.duration', type: 'HISTOGRAM', description: 'measures the duration of the inbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [ { attributes: [Object], startTime: [Array], endTime: [Array], value: [Object] } ] } { descriptor: { name: 'http.client.duration', type: 'HISTOGRAM', description: 'measures the duration of the outbound HTTP requests', unit: 'ms', valueType: 1 }, dataPointType: 0, dataPoints: [] } { descriptor: { name: 'db.client.connections.usage', type: 'UP_DOWN_COUNTER', description: 'The number of connections that are currently in the state referenced by the attribute "state".', unit: '{connections}', valueType: 1 }, dataPointType: 3, dataPoints: [] } ```
Erlang/Elixir has a record(?) data structure representation

Erlang output
``` *SPANS FOR DEBUG* {span,64480120921600870463539706779905870846,11592009751350035697,[], undefined,<<"/">>,server,-576460731933544855,-576460731890088522, {attributes,128,infinity,0, #{'http.status_code' => 200, 'http.client_ip' => <<"127.0.0.1">>, 'http.flavor' => '1.1','http.method' => <<"GET">>, 'http.scheme' => <<"http">>,'http.target' => <<"/">>, 'http.user_agent' => <<"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36">>, 'net.transport' => 'IP.TCP', 'net.host.name' => <<"localhost">>, 'net.host.port' => 4000,'net.peer.port' => 62839, 'net.sock.host.addr' => <<"127.0.0.1">>, 'net.sock.peer.addr' => <<"127.0.0.1">>, 'http.route' => <<"/">>,'phoenix.action' => home, 'phoenix.plug' => 'Elixir.DiceGameWeb.PageController'}}, {events,128,128,infinity,0,[]}, {links,128,128,infinity,0,[]}, undefined,1,false, {instrumentation_scope,<<"opentelemetry_phoenix">>,<<"1.1.0">>, undefined}} ```
Java has a log-line like representation

Java output
**Span** ``` [otel.javaagent 2023-04-24 17:33:54:567 +0200] [http-nio-8080-exec-1] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'RollController.index' : 70c2f04ec863a956e9af975ba0d983ee 7fd145f5cda13625 INTERNAL [tracer: io.opentelemetry.spring-webmvc-6.0:1.25.0-alpha] AttributesMap{data= {thread.id=39, thread.name=http-nio-8080-exec-1}, capacity=128, totalAddedValues=2} [otel.javaagent 2023-04-24 17:33:54:568 +0200] [http-nio-8080-exec-1] INFO io.opentelemetry.exporter.logging.LoggingSpanExporter - 'GET /rolldice' : 70c2f04ec863a956e9af975ba0d983ee 647ad186ad53eccf SERVER [tracer: io.opentelemetry.tomcat-10.0:1.25.0-alpha] AttributesMap{ data={user_agent.original=curl/7.87.0, net.host.name=localhost, net.transport=ip_tcp, http.target=/rolldice, net.sock.peer.addr=127.0.0.1, thread.name=http-nio-8080-exec-1, net.sock.peer.port=53422, http.route=/rolldice, net.sock.host.addr=127.0.0.1, thread.id=39, net.protocol.name=http, http.status_code=200, http.scheme=http, net.protocol.version=1.1, http.response_content_length=1, net.host.port=8080, http.method=GET}, capacity=128, totalAddedValues=17} ``` **Metric** ``` [otel.javaagent 2023-04-24 17:34:25:347 +0200] [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - Received a collection of 19 metrics for export. [otel.javaagent 2023-04-24 17:34:25:347 +0200] [PeriodicMetricReader-1] INFO io.opentelemetry.exporter.logging.LoggingMetricExporter - metric: ImmutableMetricData{resource=Resource{schemaUrl= https://opentelemetry.io/schemas/1.19.0, attributes={host.arch="aarch64", host.name="OPENTELEMETRY", os.description="Mac OS X 13.3.1", os.type="darwin", process.command_args=[/bin/java, -jar, java-simple.jar], process.executable.path="/bin/java", process.pid=64497, process.runtime.description="Homebrew OpenJDK 64-Bit Server VM 20", process.runtime.name="OpenJDK Runtime Environment", process.runtime.version="20", service.name="java-simple", telemetry.auto.version="1.25.0", telemetry.sdk.language="java", telemetry.sdk.name="opentelemetry", telemetry.sdk.version="1.25.0"}}, instrumentationScopeInfo=InstrumentationScopeInfo{name=io.opentelemetry.runtime-metrics, version=1.25.0, schemaUrl=null, attributes={}}, name=process.runtime.jvm.buffer.limit, description=Total capacity of the buffers in this pool, unit=By, type=LONG_SUM, data=ImmutableSumData{points= [ImmutableLongPointData{startEpochNanos=1682350405319221000, epochNanos=1682350465326752000, attributes= {pool="mapped - 'non-volatile memory'"}, value=0, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1682350405319221000, epochNanos=1682350465326752000, attributes={pool="mapped"}, value=0, exemplars=[]}, ImmutableLongPointData{startEpochNanos=1682350405319221000, epochNanos=1682350465326752000, attributes={pool="direct"}, value=8192, exemplars=[]}], monotonic=false, aggregationTemporality=CUMULATIVE}} ... ```
Python has a json like representation (which is different to the JS one)

Python output
**Span** ```json { "name": "/rolldice", "context": { "trace_id": "0xdcd253b9501348b63369d83219da0b14", "span_id": "0x886c05bc23d2250e", "trace_state": "[]" }, "kind": "SpanKind.SERVER", "parent_id": null, "start_time": "2022-04-27T23:53:11.533109Z", "end_time": "2022-04-27T23:53:11.534097Z", "status": { "status_code": "UNSET" }, "attributes": { "http.method": "GET", "http.server_name": "127.0.0.1", "http.scheme": "http", "net.host.port": 5000, "http.host": "localhost:5000", "http.target": "/rolldice", "net.peer.ip": "127.0.0.1", "http.user_agent": "curl/7.68.0", "net.peer.port": 52538, "http.flavor": "1.1", "http.route": "/rolldice", "http.status_code": 200 }, "events": [], "links": [], "resource": { "attributes": { "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.14.0", "telemetry.auto.version": "0.35b0", "service.name": "unknown_service" }, "schema_url": "" } } ``` **Metric** ```json { "resource_metrics": [ { "resource": { "attributes": { "service.name": "unknown_service", "telemetry.auto.version": "0.34b0", "telemetry.sdk.language": "python", "telemetry.sdk.name": "opentelemetry", "telemetry.sdk.version": "1.13.0" }, "schema_url": "" }, "schema_url": "", "scope_metrics": [ { "metrics": [ { "data": { "aggregation_temporality": 2, "data_points": [ { "attributes": { "http.flavor": "1.1", "http.host": "localhost:5000", "http.method": "GET", "http.scheme": "http", "http.server_name": "127.0.0.1" }, "start_time_unix_nano": 1666077040061693305, "time_unix_nano": 1666077098181107419, "value": 0 } ], "is_monotonic": false }, "description": "measures the number of concurrent HTTP requests that are currently in-flight", "name": "http.server.active_requests", "unit": "requests" }, { "data": { "aggregation_temporality": 2, "data_points": [ { "attributes": { "http.flavor": "1.1", "http.host": "localhost:5000", "http.method": "GET", "http.scheme": "http", "http.server_name": "127.0.0.1", "http.status_code": 200, "net.host.port": 5000 }, "bucket_counts": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], "count": 1, "explicit_bounds": [ 0, 5, 10, 25, 50, 75, 100, 250, 500, 1000 ], "max": 1, "min": 1, "start_time_unix_nano": 1666077040063027610, "sum": 1, "time_unix_nano": 1666077098181107419 } ] }, "description": "measures the duration of the inbound HTTP request", "name": "http.server.duration", "unit": "ms" } ], "schema_url": "", "scope": { "name": "opentelemetry.instrumentation.flask", "schema_url": "", "version": "0.34b0" } } ] } ] } ```
C++ has also a json-like representation

C++ output
**Span** ``` { name : /helloworld trace_id : 05eec7a55d3544434265dad89d7fe96f span_id : 45fb62c58c907f05 tracestate : parent_span_id: 0000000000000000 start : 1665577080650384378 duration : 1640298 description : span kind : Client status : Unset attributes : http.header.Date: Wed, 12 Oct 2022 12:18:00 GMT http.header.Content-Length: 0 http.status_code: 200 http.method: GET .header.Host: localhost http.header.Content-Type: text/plain http.header.Connection: keep-alive .scheme: http http.url: http://localhost:8800/helloworld events : links : resources : service.name: unknown_service telemetry.sdk.version: 1.6.1 telemetry.sdk.name: opentelemetry telemetry.sdk.language: cpp instr-lib : http-client } ```

Having a consistent output would make debugging easier and also allow us (@open-telemetry/docs-approvers) to create some more consistent documentation and with that a better experience for the end-users.

To add this as an additional note, there is also no consistency on the name of that exporter (I see console, stdout, logging, ostream), which I am less worried about, but this is also reflected in the value I need to set console output via an environment variable.

arminru commented 1 year ago

Personally, I think it'd be nice to have but not necessary to consolidate the human-readable log output across all SDKs. They all contain pretty much the same information and it should be relatively straightforward to orient oneself quickly when switching between them.

One thing that should be normalized, however, is the representation of the Trace IDs and Span IDs. One might want to grep through their SDK and Collector logs based on a Trace or Span ID and would therefore need to know about the different representations being used and convert between all possible options to grep them in parallel, which would be very cumbersome.

Seems like the following representations are currently used as per the examples that @svrnm (thank you!) gathered above:

JS (lowercase hex string)

  "traceId": "3f1fe6256ea46d19ec3ca97b3409ad6d",
  "parentId": "f0b7b340dd6e08a7",
  "id": "41a27f331c7bfed3",

Erlang (decimal)

{span,64480120921600870463539706779905870846,11592009751350035697,[],
    undefined,<<"/">>,server,-576460731933544855,-576460731890088522,

Java (lowercase hex string)

70c2f04ec863a956e9af975ba0d983ee 7fd145f5cda13625 INTERNAL

Python (lowercase hex string with a 0x prefix)

    "trace_id": "0xdcd253b9501348b63369d83219da0b14",
    "span_id": "0x886c05bc23d2250e",
  "parent_id": null,

C++ (lowercase hex string)

  trace_id      : 05eec7a55d3544434265dad89d7fe96f
  span_id       : 45fb62c58c907f05
  parent_span_id: 0000000000000000

I think going with a lowercase hex string (either with or without the 0x prefix) would be most practical.

dyladan commented 1 year ago

One big advantage is that the output for language SDKs is typically displayed in a format the language developers are already used to seeing. For example in Ruby the => syntax for maps is quite common but a JS developer would find it quite weird. Another advantage is that it discourages people from trying to programmatically consume the console output.

There are several things that could be specified and we can do all or none of them:

What fields are exported. Do we export everything possible? Do we want to define some verbosity levels?
What format are IDs exported in? In at least erlang/elixir trace and span IDs are exported as integers rather than hex making it difficult to quickly see if the same ID is appearing in your collector output for example.
What is the overall structure of the data? JSON, YAML, tabular, et cetera. Is it meant to be human readable, machine readable, or both?

Personally most of the time I'm using the console logger as a debugging tool just to see that spans/metrics/logs are being collected and exported at all. Usually I don't care about most of the data there. Sometimes, when I am developing instrumentation I care about a specific set of attributes. There are many attributes like trace flags that I almost never care about.

Here's what I would suggest we specify:

A minimum set of data that a console exporter should display. If an implementation wants to display more then fine, possibly configured with a verbosity flag. It MUST be possible to see all data attributes
IDs are displayed as hex in order to make comparison between SDKs and collector more easily

I would not specify the structure of the overall object itself such as JSON/YAML/etc. I would leave a non-normative hint that the target audience is developers of the language debugging their telemetry and that the data should be in a format that users of that programming language are likely to be regularly familiar with.

cartermp commented 1 year ago

Hmmm. I like the idea of language-specific idioms. I think the main challenge isn't one of uniformity, but of verbosity. Some languages (like .NET) use a nice and compact format. Others (like JS and Python) do not, so it's harder to scan for things.

dyladan commented 1 year ago

Hmmm. I like the idea of language-specific idioms. I think the main challenge isn't one of uniformity, but of verbosity. Some languages (like .NET) use a nice and compact format. Others (like JS and Python) do not, so it's harder to scan for things.

I also prefer a compact format most of the time. In JS the reason we decided on the verbose format was mainly because we were using it to develop the SDK itself and were frequently looking at specific data. It was never really meant for end users in the early days. In a lot of ways it still isn't beyond day 0 demos.

cartermp commented 1 year ago

Makes sense. I think I'd be unambiguously in favor of the spec offering general guidance on readability/compactness of outputs. If it's a blessed path + there's some general guidelines to follow, then that's a nice thing for outside contributors to pick up as a pure value-add.

open-telemetry / opentelemetry-specification

Do we need a spec for console output format? #3565