open-telemetry / oteps

OpenTelemetry Enhancement Proposals
https://opentelemetry.io
Apache License 2.0
326 stars 157 forks source link

Proposal: OpenTelemetry Sandbox #231

Closed jpkrohling closed 8 months ago

jpkrohling commented 1 year ago

Over the last months, I have seen a few situations where people have come to our community proposing interesting ideas to be adopted. I have also seen vendors offering code donations to the project, some of which are now mostly unmaintained.

As a possible solution to this, I would like to propose a new GitHub organization, opentelemetry-sandbox. This organization would host projects until we are confident they have a healthy community behind them. They would also serve as a neutral place for the community to conduct experiments.

The advantage of a sandbox organization is that we can still have governance rules there, making sure it’s an inclusive place for people to collaborate while keeping the reputation of the OpenTelemetry project as a whole untouched, given that it would be clear that OpenTelemetry doesn’t officially support projects within the sandbox.

There is a desire, but not an expectation, that projects will be moved from the sandbox as an official SIG or incorporated into an existing SIG. There’s also no expectation that the OpenTelemetry project will provide resources to the sandbox project, like extra GitHub CI minutes or Zoom meeting rooms, although we might evaluate individual requests.

This OTEP is inspired by CNCF’s sandbox projects, but the process is significantly different.

Examples

Here are a few projects that I see as suitable for the sandbox:

  1. We have previously discussed having or fostering experiments with LLMs related to observability. The sandbox will be the perfect place for this without risking reputational damage to the project if the outcomes aren’t on par with the expectations.
  2. There are a couple of code donation proposals in place that could have been accepted as part of the sandbox, such as:
  3. During a previous Outreachy internship, a command-line interface tool was developed to assist in the bootstrapping of OpenTelemetry Collector components. It was primarily developed in the intern’s GitHub account, with little community visibility and involvement.
  4. I have a few custom distributions for the OpenTelemetry Collector, such as the “sidecar”, that are currently hosted on my employer’s organization. Given that they are not tied to my employer’s backends, they would probably benefit a broader range of users from being available at the sandbox.

Acceptance criteria

A low barrier to entry would be desired for the sandbox. While the process can be refined based on our experience, my initial proposal for the process is the following:

  1. Proposals should be written following the template below and have one Technical Committee (TC) and/or Governance Committee (GC) sponsor, who will regularly provide the TC and GC information about the state of the project.
  2. Once a sponsor is found, the TC and GC will vote on accepting this new project on the Slack channel #opentelemetry-gc-tc.
    1. After one week, the voting closes automatically, with the proposal being accepted if it has received at least one 👍 (that of the sponsor, presumably).
    2. If at least one 👎 is given, or a TC/GC member has restrictions about the project but hasn’t given a 👎 , the voting continues until a majority is reached or the restrictions are cleared.
    3. The voting closes automatically once a simple majority of the TC/GC electorate has chosen one side.
  3. Proponents should abide by OpenTelemetry’s Code of Conduct (currently the same as CNCF’s).
  4. There’s no expectation that small sandbox projects will have regular calls, but there is an expectation that all decisions will be made in public and transparently.
  5. Sandbox projects do NOT have the right to feature OpenTelemetry’s name on their websites.

Template

Project name:

Repository name:

Motivation:

Zoom room requested?

Example

Project name: OpenTelemetry Collector Community Distributions

Repository name: opentelemetry-collector-distributions

Motivation: The OpenTelemetry Collector Builder allows people to create their own distributions, and while the OpenTelemetry Collector project has no intentions (yet) on hosting other more specialized distributions, some community members are interested in providing those distributions, along with best practices on building and managing such distributions, especially around the CI/CD requirements.

Zoom room requested? No

Further details

mtwo commented 1 year ago

I like this! My only question is if it would be better / easier to have a sandbox repository or many sandbox repositories within the existing open-telemetry org. I'm curious what people think - I can see advantages and disadvantages to either layout and don't have a strong preference.

kenfinnigan commented 1 year ago

I like the idea a lot. We did something similar with Eclipse MicroProfile as a means of incubating proposals before standardizing them.

On how they're housed, I'd suggest either a single sandbox repo in the existing open-telemetry org, or a separate open-telemetry-sandbox org with a repo for each proposal. I believe a separate repo for each idea in the already large open-telemetry org would add even more possible clutter over time.

austinlparker commented 1 year ago

I would advocate for an open-telemetry-sandbox org.

yurishkuro commented 1 year ago

It's not clear to me what problem this is aiming to solve and for whom. Eg for project author it provides discoverability, but why is it OTEL's job to provide? And even discoverability is unclear - if OTEL is not going to promote these projects in any way, how are they discoverable? Or could we just have a page on the website that lists these external projects?

austinlparker commented 1 year ago

It's not clear to me what problem this is aiming to solve and for whom. Eg for project author it provides discoverability, but why is it OTEL's job to provide? And even discoverability is unclear - if OTEL is not going to promote these projects in any way, how are they discoverable? Or could we just have a page on the website that lists these external projects?

We already have a mechanism for project discovery (registry), but I'd argue that this is more about stuff that we think is important to the project/ecosystem but also expands the scope in some way? Android SDK and desktop viewer are both interesting examples of this; I can see other "products" living here as an interim step.

pyohannes commented 1 year ago

It might be good to have a look at the (historical) Kubernetes incubation process: https://github.com/kubernetes/community/blob/9ce2bdc3bb1a9e5b0acea5a4a2dbe8870041de28/incubator.md

I especially like this section, which clearly defines goals for an incubation project:

Exiting Incubation

An Incubator Project must exit 12 months from the date of acceptance in one of these three ways:

  • Graduate as a new Kubernetes Project
  • Merge with another Kubernetes Project
  • Retirement

However, as the project matured this was superseded by a flexible hierarchy of SIGs, subprojects, and WGs (https://github.com/kubernetes/community/blob/master/governance.md#community-groups).

It's not clear to me what problem this is aiming to solve and for whom. Eg for project author it provides discoverability, but why is it OTEL's job to provide?

In my understanding, the main benefit this would provide to OTEL is that it allows to clearly define criteria and evaluate candidates for future OTEL projects.

jpkrohling commented 1 year ago

I'm happy to see that the idea is being well-received!

a separate open-telemetry-sandbox org with a repo for each proposal

This would be my preference. Looking at the OpenTelemetry Operator, there was a discussion back then to host it under OpenTelemetry Collector Contrib's repo, and I think the nature of an operator and the tooling around it makes it better to have it on its own repository. Extrapolating this a bit, I would guess that future projects would also benefit from having their own repos (own CI, own review process, own release frequency, ...)

I would advocate for an open-telemetry-sandbox org.

Naming-wise, I would say that it should be opentelemetry-sandbox, for two reasons: we spell it OpenTelemetry as one word, and all repositories within the open-telemetry organization follow the opentelemetry- pattern. The only place I know where OpenTelemetry is two words is in the main GitHub organization 🙂

It's not clear to me what problem this is aiming to solve and for whom.

The one thing this proposal brings that cannot be achieved by just creating a separate org owned by a random person is that we can bring and enforce our governance model (and code of conduct). With an organization under OTel's governance, we have a neutral playing field where can contribute knowing that this isn't owned by any one of the parties, who might potentially even be a competitor. Everything else, including discoverability, can be solved with existing tooling.

@pyohannes' argument is not something I had in mind when I originally drafted the OTEP, but I can see it as one of the benefits.

Exiting Incubation

The idea of forcing projects to exit incubation after 12 months is a nice one. I had thought about creating mechanisms to retire and graduate projects, but haven't thought about forcing this to happen within a specific timeframe. I'll incorporate this in the OTEP, thank you for the suggestion!

austinlparker commented 1 year ago

Naming-wise, I would say that it should be opentelemetry-sandbox, for two reasons: we spell it OpenTelemetry as one word, and all repositories within the open-telemetry organization follow the opentelemetry- pattern. The only place I know where OpenTelemetry is two words is in the main GitHub organization 🙂

/shrug Strikes me that we should be consistent with the other org name though? There's already a github.com/opentelemetry after all, and it isnt us.

jpkrohling commented 1 year ago

Strikes me that we should be consistent with the other org name though?

I prefer to be consistent with everything else, but I registered the open-telemetry-sandbox organization as well just in case. We can have a vote on the name if we decide to continue with the OTEP.

jpkrohling commented 1 year ago

I'll address the comments soon, but I have just seen an example of this in the wild: https://github.com/open-telemetry/opentelemetry-proto/pull/488#issuecomment-1611692999

The sandbox is exactly what @tigrannajaryan proposed there but on a separate organization.

tedsuo commented 9 months ago

I am partially in support of this idea. But I need to get all of my feelings and concerns out there first, in the form of a rant.

I am incredibly nervous about having an official sandbox. It sounds great as an idea - let people explore and experiment! But in practice it runs counter to the major problems we have had with SIGs over the past three years.

Every time a group within OpenTelemetry has started working on a project in isolation – with an official blessing but without active TC or maintainer involvement – they run into serious problems that end up demoralizing the entire group.

Having large amounts of work rejected

Outside groups that get significantly into their work without mentorship often fail to understand OpenTelemetry's architecture or design philosophy. In other words, they miss a bunch of requirements. This leads to their work being rejected and being told that they have to start over. All of this could have been avoided if they were getting design review and feedback as they went.

Reviewing huge projects is overwhelming, feedback is slow

asking TC members and maintainers to wade into a large project and "review" it is intimidating. It often is not even clear what the review should entail. The amount of effort involved can make it incredibly difficult to schedule, relative to other projects and other work we are doing. This leads to huge delays where the groups has to sit on their hands and wait for months before they can get attention.

De-facto stabilization is a serious issue

With OpenTelemetry, the concept of "beta" or "experimental" often turns out to be bullshit. If something useful sits around for long enough, end users will use it. And due to the far reaching nature of OpenTelemetry, and the issues related to users taking a dependency on a cross-cutting concern, we end up in a situation where users will not stand for breaking changes regardless of the "experimental" label. We don't get to point to a sign on the wall and say "I told you so." We put ourselves in this situation every single time we let something sit in an experimental stage for a long period. We are untangling a huge mess in semconv right now because of it – the timeline for fixing it is measured in years.

I don't want to deal with this mess

Time and time again, I have been the person to wade in to fix these problems, re-moralize groups, act as a de-facto TC member to keep them on track, and negotiate a path forward. How is this proposal anything but carte blanche for other GC members to rubber stamp projects, creating a debt that I will end up having to pay off for the group at my personal expense? I have not seen anyone else putting in the kind of work I have had to do to prevent these groups from dissolving in frustration.

Maybe that is part of why this proposal seems like a good idea to others – they have no intention of dealing with the consequences!

We keep acknowledging that we have a maximum throughput to this project. And we have put in a lot of effort this year towards building systems to help us visualize the amount of bandwidth we have, so that we don't end up back in the place we were a year ago: overcommitted with a general feeling that "things are slow." Why is the GC so intent on trying to circumvent these limits? We can't wish them away by sweeping them into a different GitHub org. We either have to finish every project that we start, or face parts of our community churning and walking away in frustration. Think about the issues and fallout the Go community has faced around community groups getting their work rejected. And the end of the day, it will be us that looks irresponsible.

What is a middle ground?

I think there are good examples of small projects, such as dev tools, which interns like to work on and would benefit from more community and visibility. If it truly doesn't matter if a project is completed, if it truly cannot create a de-facto v1.0 through gaining public use in production – that sounds fine. I would be completely in favor of a sandbox as a place for these kinds of projects.

If we want a place to do wildly experimental work, where the explicit understanding is that the work is a research, and cannot be designed to be released to the public – that sounds fine. I would be completely in favor of a sandbox as a place for these kinds of projects as well.

If we want a place for odds and ends, configs and custom distros – that also sounds fine to me.

The Benchmarks SIG is an example of the kind of project that would benefit from a sandbox. We want to start developing benchmarks, but we don't want maintainers to feel beholden to them until we have iterated on them and feel satisfied that they have value.

But I cannot support a proposal that will create a situation where next year I am once again negotiating with the TC to take on six more projects they did not agree to, because a group of industry veterans got the green light from us to move ahead on a project without our ability to manage it with them. We have to respect the fact that we have limits, and that it means that we have to prioritize. The Mainframe SIG is an example of this – it is better to negotiate with that group to create a timeline that we can all agree to than tell them to build a Cobol implementation which will be declared "experimental" while being put into use in major production systems. Or just as bad, a group screaming at us in frustration because we are blocking them from putting it into use after they have built it, because we have mismanaged expectations.

So, what is the middle ground here? Is a place for intern projects and experiments in trace-driven-development what we want? Is this a dumping ground for custom distributions, configurations, helm charts? I can agree to that. Is that what people are actually proposing? My (extremely intense) concern is that have I heard this proposal raised in the context of serious industry-led extensions to OpenTelemetry's production surface area.

Can we accept the limitation that nothing of this sort – new signals, new APIs, new SDKs, new Semantic Conventions – qualifies as a sandbox project?

jpkrohling commented 8 months ago

@tedsuo, thank you for your eye-opening message. It took me a while to read it as I've been AFK for a bit longer than a week and decided to read it a couple of times and let it sink in for a day or two before answering.

Every time a group within OpenTelemetry has started working on a project in isolation – with an official blessing but without active TC or maintainer involvement – they run into serious problems that end up demoralizing the entire group.

I would have appreciated some examples, but I trust you do have enough concrete cases in mind to say that.

Time and time again, I have been the person to wade in to fix these problems, re-moralize groups, act as a de-facto TC member to keep them on track, and negotiate a path forward

That's an unfortunate truth: while we have a lot of help in specific parts, we have only a few members who feel responsible for the success of the project as a whole among the TC and GC.

I will end up having to pay off for the group at my personal expense?

The last thing we need is you burning out. And this is where I decide to withdraw my proposal: it's clear to me that we have bigger problems to solve before we can have a sandbox.