Open atrauzzi opened 2 years ago
Instead, what we currently have is that applications have to install vendor-specific exporters which means that if you don't control the source code for an application, you're basically left without an option for getting telemetry out.
This is not entirely true. You can send from OpenTelemetry SDK to the Collector using OTLP and from the Collector to the vendor using vendor-specific exporters that we have in the Collector. The typical configuration is Otel-instrumented application sending via OTLP to the Collector running on localhost and that is reflected in the default SDK settings.
I do agree with the general sentiment that we can do better in promoting OTLP.
Keep in mind that until recently we only had traces portion of OTLP stable, metrics became stable recently and logs are not stable yet. This adds some reluctancy on the efforts to add support for OTLP by vendors. Nevertheless, metrics are stable now and logs are nearing stability so we are now better positioned to market OTLP more widely.
You can send from OpenTelemetry SDK to the Collector using OTLP and from the Collector to the vendor...
I'm aware of this scenario but didn't specifically mention it as it would be impractical in my Cloud Run (or any other PaaS like scenario).
I'd also offer - warmly :heart: - that such a suggestion really doesn't do the cause any justice as it allows cloud vendors to push compounding complexity and maintenance burden on people who simply want to be able to extract the benefit out of a standard. Like, we really mustn't make light of "running a collector". That's basically a VM or sidecar in a deployment that may not even be able to support a VM or sidecar.
Otherwise, yeah... I just think that effort needs to hit hard and hit fast because vendors are entrenching around the worst-case scenario I described. Even so far as having their support channels regurgitating it as convention. Which also makes it very difficult to communicate the desire to see the proper approaches taken.
I agree with you. We should recommend OTLP on all hops, with or without Collector, and at the vendors' ingest endpoints.
As an example I think we need to highlight OTLP in the docs at https://opentelemetry.io/docs/ We don't really do it a justice, there are just a couple scattered references to OTLP exporters on some pages.
Maybe a list on the website of tracing backends known to support native OTLP helps this issue? I would suggest something similar to the openmetrics compliance program but I think that might be a lot of work and a simple list is a quick win. I know some vendors at least already have it and many would jump at the chance to be listed on the website in an official capacity of some kind.
It woud be nice to see some proactive outreach as part of whatever else is done. If only because vendors are already running with the ball in the wrong direction. A direction which of course also implies lock-in :cry:
I'd also say it's a must for compliance. Like, this is exactly the thing that people are adopting OTEL for...
If only because vendors are already running with the ball in the wrong direction.
At least the vendors that contribute to otel seem very willing to enable OTLP ingest. There are several vendors who already do this. I don't want to list them for fear of missing someone and making them sad but they're out there.
@atrauzzi Regarding your comment on Google Cloud OTLP support, I can confirm the message is heard.
@jsuereth - That's really awesome to hear. I know timelines are hard to provide so let's say I'm not looking for one, but is there a good place to track progress or even make myself available for discussions and testing? Basically would just like to keep up and offer input.
Again, we're using .NET 6 and OTEL, and I'd really like to be able to send all that data into monitoring. Right now it seems like Honeycomb is the only vendor that got it right :tm:
Hi @atrauzzi -
It's great to see end-user feedback directly so thank you for your engagement!
I made an assumption when you said 'cloud providers' that you meant the big three cloud platforms - so I didn't think your statement applied to observability-focused vendors like Honeycomb. I know a lot of us are strongly opinionated that pure/native OTLP ingest is the best path forward for the industry (and end-users!) and to that, there's a growing number of backend vendors that natively accept telemetry via OTLP. I can't vouch for all these experiences, but here's a centralized list of vendors offering native OTLP support, no collector needed :)
I'm curious about what you mean when you say:
Honeycomb is the only vendor that got it right™️
What about your experiences with Honeycomb gave you such a positive impression?
@sharrmander -- No problem, engagement is something I do, I'm sure it's both refreshing to some and frustrating to others. :sweat_smile:
Thank you for the list! I was not aware that it existed. Generally I do mean the big vendors, although my specific experience is with Azure and GCP. Both of which managed to get this wrong in exactly the same way.
As for Honeycomb, I've never actually used the product, though I've seen a lot of Charitys advocacy. If a cloud vendor has proper OTEL support on the roadmap, I wouldn't really be able to justify the extra spend for the egress and service for something like Honeycomb. But that said, it's clear to me that it's a good product for those who can make use of it. Particularly because the company gets out in front of everything and seems to have a well honed and progressive technical instinct. Again, that's all just by its public persona and community presence, not as anyone who has used it...ever. :laughing:
On a separate note, I think it's disingenuous to use the phrase "native support" on this page for anything less than "Native OTLP". That is to say, Azure and AWS should not be on that list because they simply cannot be considered as having an offering that's at all desirable.
As a developer, their OpenTelemetry stories are full of landmines if the support isn't as dead-simple as "Configure this endpoint for your OTLP exporter in your application, wipe hands on pants."
Thanks for clarifying @atrauzzi.
I appreciate that OpenTelemetry is many things, so while the data transmission protocol is important, is only part of the value prop of OTel. So I can get behind 'native support' on that opentelemetry.io webpage; especially for end-users who need, for example, the assurances that the upstream project has been performance tested to AWS's standards as part of their ADOT distribution.
I don't know, that doesn't make sense to me. "ADOT distribution" contradicts what OpenTelemetry was supposed to be in the first place which was one way to instrument and thus one way to export.
Allowing vendors to shoehorn caveats in and then taking credit for only partial execution erases the value that everyone should rightfully assume is implied by the name "OpenTelemetry". Performance testing AWS-side should be that they performance test a setup that uses pure OTLP. I don't think any vendor should be rewarded for twisting what most assuredly will be broadly assumed when the word "supported" is used.
Letting vendors get away with any less cheapens the OpenTelemetry "brand" (for lack of better terms).
Thanks for creating the issue @atrauzzi! @yurishkuro raised this during today's governance committee call.
We discussed the following:
@mtwo Amazing! Thank you so much.
Is there any chance the vendor list can de-emphasize vendors that aren't "pure" OTEL? Perhaps have two lists, so that any vendor who wants top-billing on the page has to go all the way?
Honestly, it would send a really strong message. Bigger list at the top with a richer entry, smaller list lower down of "aspirants". Makes it clear that they have work to be done and that simply making it to the page doesn't mean "job done".
I think it's fine to have one list with green checks / red crosses in the columns.
Do you have any reason why "you think"? I've listed some good reasons so far that go far beyond just "I think".
Homogenizing the list won't incentivize better support because the most minimal effort will get equal recognition as a more complete effort. WHO does it serve to systematically reward that? OTEL should show some self-interest here.
We could have something like ⚠️ for "OTLP ingest may require a collector, custom exporter, or custom SDK distribution; please check vendor docs for details"
It's better, but not great. I will continue to emphasize the risk of underestimating the optics.
As someone who has consumed several cloud services from different vendors, all vendors are in the business of saying "yes". Even if it means being disingenuous about it. The most frustrating thing we could do to people who are the target audience of the list (developers!) is giving vendors a way to coopt OTEL in conveying a false impression. Which also undermines the brand and reputation of OTEL itself.
Two lists is best. They can be structurally the same. Just make the second list a little smaller than the first one and put it further down.
Again, remember who these lists are for, who will be consuming them and why.
Hello, I hate to be "that guy" but... the list still has no explanation to what "Native OTLP" means.
In my mind, it means they run an OTLP receiver, and support most of the features, at least for things like traces that are at a good level of maturity.
But it doesn't seem to be the case. Take for example datadog. They are listed as "native" but (as far as I can tell) you need an exporter to send them traces, and they don't seem to support relatively established stuff like span links or span events.
So, what does "native" means, and how is that list supposed to be helping anyone choose a vendor ?
I believe "native OTLP" means the backend is able to receive OTLP. If datadog doesn't then it needs to be fixed in the list.
@tigrannajaryan I agree with people that the current page is pretty unhelpful since it doesn't even define what the columns mean. At best it's "this vendor has 'something' related to OTEL".
We need a concrete proposal of what additional columns to add to the table, and how to go about populating those columns. Because the existing "native OTLP" is already misused, I would suggest resetting all vendors in the list to a question mark and asking them to file a PR that changes the values as needed while providing the evidence. We also need to provide clear definition of what ❌ ⚠️ ✅ values mean for each column.
Concrete proposal:
Language is an important dimension for the table, so just having SDK or Distribution is not sufficient.
- 📦 - Distribution
So this about an SDK distribution for the particular language, right? Vendors can also have a Collector distribution, so perhaps have a separate column for that.
- ✅ - OSS SDK can be used (requires native OTLP)
Maybe label this differently than "OSS". Typically vendor distributions are also open-source.
Maybe label this differently than "OSS". Typically vendor distributions are also open-source.
+1 - Official OpenTelemetry SDK
So this about an SDK distribution for the particular language, right? Vendors can also have a Collector distribution, so perhaps have a separate column for that.
Perhaps do the same for collector as for SDKs, a single column with
While I agree that https://opentelemetry.io/vendors/ needs to change, I have my issues with running & maintaining such a complex list: we can of course ask vendors to update that list once or from time-to-time, but eventually the burden to maintain the list lays with the Comms SIG, which takes away bandwidth from other things we urgently need to do.
So, what does "native" means, and how is that list supposed to be helping anyone choose a vendor ?
I don't think it is the responsibility of the community to help end-users making a choice which vendor to use.
cc @open-telemetry/docs-approvers
I don't think it is the responsibility of the community to help end-users making a choice which vendor to use.
Fair enough. But in that case, why not just delete that list ? I think either it provides useful information, or it's better for it to not be there at all.
Deleting the list is also a viable solution. But as was argued here earlier, the list not only benefits vendors, the project also receives value from it by showing industry adoption and steering users towards vendors supporting native OTLP. If we focus just on this aspect, we can simplify the table to have just the vendor name with a link to their own description of OTEL support, and the Native OTLP column (but clearly defined). Ie I would remove the distro column.
the project also receives value from it by showing industry adoption and steering users towards vendors supporting native OTLP.
This is very important for us (for Otel). Precisely for this reason we should not delete the list. I am OK with rethinking it and simplifying maintenance, but I think it needs to stay in some reasonable form.
Keeping the list simple (native OTLP: yes, distributions: yes, ... etc) is OK with me, my worry was with the all-languages + collector table which is an explosion of data, I am not keen to maintain.
Here's my proposal:
DATE
like the following:DATE
all vendors without an update will be removed until they get back with that data.Additionally we will remove the "Learn More" column since the links brought as proof will have all the end-user needs to know. If we like we can add additional columns eventually, e.g. if there's a vendor-specific collector, or if there's a fork/blog/doc around using the otel demo, etc.
you have a distribution
Collector distribution or SDK distributions?
Keeping the list simple (native OTLP: yes, distributions: yes, ... etc) is OK with me, my worry was with the all-languages + collector table which is an explosion of data, I am not keen to maintain.
I second this. Keeping it simple is a good idea. But please add a few lines about "native OTLP" meaning that the vendor supports receiving telemetry using an OTLP endpoint and not requiring a custom exporter.
Here's my proposal:
1. We ask vendors to revalidate their row until `DATE` like the following: 2. Bring proof that your backend supports native OTLP, that you have a distribution or that you require an exporter. Those proofs have to be a link to their docs for showing OTLP support & a link to their distribution/exporter. Those links will be included in the table. 3. When we pass the deadline `DATE` all vendors without an update will be removed until they get back with that data. 4. If a link is broken, we will set it back to "NO" and let the vendor now that they need to update.
Seems perfect to me.
So, adding your feedback, the table could look something like this:
Name | backend with native OTLP support | vendor-specific exporter | Distribution |
---|---|---|---|
Vendor A | [link to docs] | NO | [link to collector distro] [link to SDK distro] |
Vendor B | NO | [link to collector exporter] | [link to collector distro] |
Vendor C | NO | [link to collector exporter] [link to SDK exporters] | NO |
Vendor D | [link to docs] (only traces) | NO | NO |
(As an alternative there could be separate columns for Collector/SDK in exporter&Distro
Note that [link to SDK distro]
will need to be plural, since several vendors have several SDK distributions.
So, adding your feedback, the table could look something like this: Name backend with native OTLP support vendor-specific exporter Distribution Vendor A [link to docs] NO [link to collector distro] [link to SDK distro] Vendor B NO [link to collector exporter] [link to collector distro] Vendor C NO [link to collector exporter] [link to SDK exporters] NO Vendor D [link to docs] (only traces) NO NO
(As an alternative there could be separate columns for Collector/SDK in exporter&Distro
Isn't "Native OTLP endpoint" less ambiguous and shorter than "backend with native OTLP support" ? Other than that, seems good.
Agreed, we need to be super explicit about whether users can just use a community library in their processes and send them to a well-known endpoint.
Vendors are going to dance around with these concepts and it's important for this list to help people identify which vendors are playing nicely.
I'm not sure if this is the best place to at least offer a starting point for my concern, but at least wherever things end up, it'll be tracked here so that it's searchable...
I'm in the process of getting my company established on Google Cloud. One sticking point which I also encountered while on Azure is that all the major cloud providers seem to be under the impression that in order to support OpenTelemetry, they all have to provide a vendor-specific exporter library.
My understanding is that part of the whole point of OpenTelemetry is not just to offer a consistent and agnostic API surface area, but to also offer the OTLP wire protocol for traces to be exported.
I feel like it's a glaring failure of advocacy and communication from OpenTelemetry (as an overall initiative) that all vendors seem to be requiring application developers to modify their source code to get support for tracing.
Is there no way that OpenTelemetry can reach out to all its senior engineering contacts at the various cloud providers which I know participate in the project and help guide them to establish well-known OTLP endpoint conventions on all their compute resources for applications to export to?
For example: If I'm running a C# application on Google Cloud Run, I should be able to instrument my application using the standard vendor-agnostic OpenTelemetry libraries, while also configuring my application to use the vendor-agnostic OTLP exporter. When configuring my application to run on Google Cloud Run, I should be able to simply update the endpoint URI that my application exports its telemetry data to, at which point, all of Googles cloud monitoring infra would begin receiving trace data from my application.
In this scenario, Google would not be responsible for providing ecosystem-specific exporters, but instead able to focus on integrating at an infrastructure level by exposing well known telemetry endpoints on their various offerings.
Instead, what we currently have is that applications have to install vendor-specific exporters which means that if you don't control the source code for an application, you're basically left without an option for getting telemetry out. If your platform isn't directly supported yet by the vendor, then you also have no recourse and simply cannot adopt OpenTelemetry.
Overall, this ticket is to raise concern that OpenTelemetry also has some obligation to give suggested guidance not just on how developers can operate with its own deliverables, but how service providers are to capitalize on the standard. Not just saving developers time and frustration, but also saving themselves the effort of having to maintain libraries for each possible combination of services and ecosystems!