open-telemetry / opentelemetry-specification

Specifications for OpenTelemetry
https://opentelemetry.io
Apache License 2.0
3.71k stars 887 forks source link

Consideration when using OTel SDK (and maybe collector) with non-OTLP backend #823

Open anuraaga opened 4 years ago

anuraaga commented 4 years ago

OpenTelemetry by default collects a lot of information, and backends that speak OTLP natively will generally be equipped to deal with this. But I worry we don't have a good story right now for connecting to non-OTLP backends like Zipkin (I will use Zipkin as the example since I'm most familiar with it, I wouldn't be surprised if other systems can have similar issues with storage). Zipkin instrumentation generally defaults to collecting less information since users run their own backends and pay significant cost for data. But currently, in the specification exporters are required to convert almost all information in OTLP. For example, library info and all the attributes which given the breadth of semantic conventions can be a lot. Resource hasn't been specced yet but with this pattern I expect it to also end up populating a lot of information for resources because the current stance is to preserve data during conversion

https://github.com/open-telemetry/opentelemetry-specification/pull/800#discussion_r470857107

I think the goal of preserving all data is a nice goal but it could provide challenges to organizations using the opentelemetry sdk with a zipkin backend. We can imagine a software development team that switches their instrumentation to opentelemetry, and when they release the new version, their metrics look fine and they go to prod, taking down the entire tracing infrastructure due to an explosion in data that may provide low value given the backend doesn't natively recognize it. We can expect tracing infrastructure to have count-based ingestion controls, but I think size-based controls are less common and this would be very hard to deal with.

One piece we have that can help is the collector's attribute processor. It can be used to filter out attributes. This is a nice start, but we may need to consider improvements

There are two problems with the collector - they require using the collector while many will want to export directly, and it only operates on attributes. A significant piece of the size explosion is library info, and there doesn't seem to be any way currently to filter these out, period. We may be able to extend the attributes processor to support these non-attribute protocol level information, but for the non-collector case, we probably need a way to filter data, either as opt-in or opt-out, at the SDK level too. Perhaps it's in the form of only code examples, though that is less compelling for auto instrumentation users and we may need native support for such filtering.

andrewhsu commented 4 years ago

talked about this at the spec issue triage mtg today, assigning to @bogdandrutu to make a decision about whether this is before or after ga requirement

andrewhsu commented 4 years ago

from the spec issue triage mtg today, talked with @mtwo and @reyang and from their input putting this after GA, but medium priority.

anuraaga commented 3 years ago

I was looking through some exporters in opentelemetry-collector-contrib, and noticed that at least newrelicexporter and datadogexporter seem to have a similar data model as Zipkin, and therefore seem to add all resource tags to all spans. Just curious, are there any strategies being taken into account to deal with potential information overload? Recommended configurations for dropping attributes in resourceprocessor, for example? Want to understand how people may be dealing with this issue to be able to provide recommendations :) // @tylerbenson @MrAlias (randomly pinging a couple handles I know for those :P)

jkwatson commented 3 years ago

I happen to know a lot about how the New Relic backend works, and it's fine with pretty much any number of attributes on spans...I think the limit is like 256 attributes per span?

trask commented 3 years ago

Azure Monitor pricing is per GB and so customers tend to care a lot about this issue. In our exporters we are mapping over semantic attributes that our backend needs for default experiences and just dropping the rest (looking at prefix, e.g. net., to differentiate between semantic attributes and user attributes). Also planning to let users allowlist semantic attributes that they want to keep.

anuraaga commented 3 years ago

An idea came up in https://github.com/open-telemetry/opentelemetry-java/issues/3009 to have an option to specify what resource attributes to copy in. Could be a good approach.