open-telemetry / opentelemetry-collector

OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
4.11k stars 1.38k forks source link

Define official OpenTelemetry Collector support #10561

Open tedsuo opened 2 weeks ago

tedsuo commented 2 weeks ago

Separating this discussion from #8555.

OpenTelemetry adoption is on the rise, which means that support requests are increasing. We need to define

I'm opening this issue for discussion, not because I have a specific proposal. To kick off the discussion, here is my understanding of the landscape:

Core Collector components feel like obvious candidates for support. We write this software, so we support it.

Collector Contrib is more complicated. This code lives within the OpenTelemetry organization, but there is a lot of it and it has a wider range of maintainers. Do we want to support a specific subset of contrib, and have the maintainers of other contrib plugins be responsible without our help? How would this work?

Externally Maintained Distros contain some subset of the plugins from above, and may contain additional plugins. This covers distros maintained by vendors and cloud providers, plus distros the end users have build themselves. The support issue in question for these distros is that they contain a mix of plugins that individually may have different support levels. What do we do in these cases?

I would love it if the people who are currently on the hook for support would chime in with their experiences to date, so we can understand what the actual pain points are.

atoulme commented 2 weeks ago

See https://github.com/open-telemetry/opentelemetry-collector/issues/10004 for an earlier reflection applied to code.

mx-psi commented 2 weeks ago

Some comments to try and clarify the discussion

what we are willing to support

We have different sets of Collector audiences (I recently made a change on #10539 to reflect the three sets we have today), each audience has different needs (e.g. support at the Go API level vs at the binary UI level), required levels of technical expertise and ability to deal with e.g. breaking changes. I think we can focus this issue on end-users of Collector binaries, but we'll need to discuss the rest as well eventually.

Core Collector components feel like obvious candidates for support. We write this software, so we support it.

The definition of 'core' components needs to be clarified, there are two possible meanings for this: (i) components in the opentelemetry-collector repository or (ii) components in the 'core' distro.

What do we do in these [Externally maintained distros] cases?

IMO at a minimum support requests should be reproducible with a builder-defined distro for us to look at it.

tedsuo commented 1 week ago

Thanks @mx-psi. I agree, for this discussion let's focus on support requests from operators trying to run Collector binaries.

Component-based support tiers In terms of "core" components (or whatever we want to call them, maybe "support tiers" is better terminology) I would suggest approaching support based on who maintains them. Components maintained by the Collector contributors have to be supported here, because we write that software and there are no other maintainers outside of our organization to redirect people to for support. So "Core support tier" or whatever would be this software.

"Contrib support tier" are components that contrib maintainers are committed to supporting, which is already defined here. It's a mixed bag, it actually looks like two tiers of support are present here: packages the Collector-contrib contributors are willing to maintain and packages only maintained by individual contributors who are not on that list.

Then there's "Third party support tier" that consists of components maintained outside of the OpenTelemetry organization.

Binary/pipeline support Besides the individual components, there's how they have been packaged together and configured. It seems obvious that components in the same pipeline will affect each other. What about components that are not in the same pipeline? Does configuration and the mix of different support tiers affect what kind of support we want to provide? I'm not familiar with the type of bugs and questions users tend to come to the maintainers with.

Support request procedures How would we like support requests to be submitted? What information should always be provided? Besides diagnostic information, it seems like a builder command that recreates the binary and a script or docker container that recreated the problem should be required – we shouldn't be expected to try and figure out how to make the binary and reproduce the problem. Are there any types of smaller-scale support requests that would be accepted without these items?

jpkrohling commented 6 days ago

Can we define what's "support" here? I'm certainly not ready to commit to any SLAs for providing answers to the tail-sampling processor, for instance, despite being the code owner for that.