open-telemetry / community

OpenTelemetry community content
https://opentelemetry.io
Apache License 2.0
786 stars 238 forks source link

Request for a Lambda SIG #591

Closed alolita closed 3 years ago

alolita commented 3 years ago

I’d like to formally request the formation of a SIG focused on enhancing Lambda support in OpenTelemetry components including an extension that embeds an OpenTelemetry Collector to scrape telemetry data from Lambda functions. Several collaborators from AWS and LightStep are interested in enhancing this functionality. We also welcome other interested contributors to join in.

We request a weekly meeting be scheduled every Wednesday 16:00-17:00 to support APAC contributors too. Nizar @ntyrewalla and Sandra @awssandra are interested in facilitating this weekly discussion.

Related issues: https://github.com/open-telemetry/community/issues/577

cc: @tedsuo @codeboten @anuraaga

anuraaga commented 3 years ago

/cc @kubawach

anuraaga commented 3 years ago

I'm sort of wondering - since lambda is AWS-specific, any reason not to "host the SIG" (whatever all that entails) in aws-observability? I assume we can still collaborate with Lightstep and others there too :)

alolita commented 3 years ago

Hi @anuraaga It would be great to keep the discussions in the OpenTelemetry project space since it enables everyone to participate in a neutral environment. General serverless support specs for OpenTelemetry could be derived based on what this SIG helps define and implement for Lambda support.

anuraaga commented 3 years ago

@alolita If it is a Serverless SIG, that does make sense, but we can also imagine the Lambda working group in aws-observability circling back with findings as things stabilize, it would still be a transparent place. One key point that comes to mind is ownership of potential published artifacts, for example lambda layers. AWS should definitely host these instead of OpenTelemetry since it's free for us :) This is probably still possible with an OpenTelemetry project so I think either is fine but wonder if it's not just less confusing to keep it a bit closer to the metal.

dyladan commented 3 years ago

I agree with @alolita that it's important to maintain a neutral environment. Any repository hosted outside of OpenTelemetry is outside of OpenTelemetry control, which means the decision making power of the TC and the spec is limited and anyone from aws-observability would necessarily have more power to shape the project.

As an example, the spec could make a given semantic convention "required," but if it is not important to x-ray then the aws-observability team may choose not to include it in order to reduce payload size. Even if the disagreement isn't that strong, it may just not be a priority to implement.

I don't think it is necessarily a problem to host the project in aws-observability, but I don't think you should call it an "official" OpenTelemetry project unless it is hosted and controlled by OpenTelemetry maintainers.

Additionally, I'm not an expert in this area but I'm sure there are CLA considerations. One company may decide to sign the OpenTelemetry CLA but not the Amazon CLA for one reason or another.

anuraaga commented 3 years ago

@dyladan That's a good example with the TC. With visibility comes ownership - does OpenTelemetry TC really want ownership of this? Maybe but it feels a bit out of scope to me. Making proper technical decisions related to lambda can require knowing about features on its roadmap - I suspect the TC should not be required to have NDAs to help out with community projects.

How an SDK integrates with a language is an instrumentation concern and will live there anyways I expect, similar to the wrappers @kubawach wrote for Java. This includes concerns of telemetry and semantic conventions. So these conversations will definitely still live here. If having a SIG to coordinate the language work helps, it could make sense but I suspect it needs to be called a FaaS SIG, not a Lambda one, to fit better with OTel's scope.

bhs commented 3 years ago

If the work of the Lambda+OTel group does not require changes to core OTel repos (e.g., the spec), then I don't think there needs to be an "official" OTel SIG at all. Not to say that this shouldn't or can't be a significant effort, but just that we don't need to establish a precedent that all work that depends on OTel needs an OTel SIG, since over time there are going to be a lot of projects integrating with OTel and we don't have the mechanisms in place to track and/or manage those.

That said, if the Lambda+OTel group does anticipate needing to make meaningful changes to the spec (which I can certainly imagine), IMO it would be preferable to make it a FaaS SIG as @anuraaga suggests, taking some care not to create a tight coupling with any particular FaaS implementation (like Lambda).

These opinions ^^^ are loosely held, FYI – just trying to help us set reasonable precedents.

rakyll commented 3 years ago

I see some potential areas where lifecycle of the collector and the configuration management of the collector on Lambda may require some discussions with the larger community. The Lambda work could become a reference point for other FaaS providers and a SIG might help us keeping the discussion consistently public and visible to the current and prospective OTel contributors.

maxgolov commented 3 years ago

If the goal is to create a reference point for other FaaS providers, should this be a generic SIG that includes folks working on Azure Functions, etc.?

anuraaga commented 3 years ago

Some discussion's happening on the scope of this project

https://github.com/open-telemetry/opentelemetry-lambda/pull/14#discussion_r595603941

There is tension between the user experience when hosting parts in open-telemetry in a vendor-neutral way - Lambda users are using AWS, and it seems to make sense for AWS to make decisions on aspects of the user experience in some respects, which will mean at the least using Cloudwatch given it's the integrated logs / metrics solution for Lambda - I suspect that many other vendors base their experiences on Cloudwatch, e.g. ingesting its logs, not as a replacement. But it's still an observability vendor and I'm not comfortable with open-telemetry org directing users to it.

So I guess this brings me to double-down on my original suggestion - that this should probably live in the AWS org and not on open-telemetry since it appears to overlap too much with the user experience on AWS, not just OpenTelemetry. So I'd move the repo out TBH.

tedsuo commented 3 years ago

@anuraaga this is starting to smell strongly like an attempt at vendor lock in.

If you attempt to make this work private, or otherwise restrict it to only working with the AWS build of the collector, then you are on the road to dictating who can and cannot receive telemetry from Lambda from "OpenTelemetry."

The way AWS is going about marketing their distro is already beginning to strain relationships and create PR trouble for the OpenTelemetry project, which I personally have to go and deal with. I am completely unimpressed by the insistence that something calling itself OpenTelemetry should only be viable with a fork of the project, which is what this looks like the beginnings of.

This is my first warning. Stop trying to privatize OpenTelemetry. Rethink how you are approaching this.

Oberon00 commented 3 years ago

@tedsuo Related: https://github.com/open-telemetry/opentelemetry-specification/pull/1442#discussion_r596199692

Oberon00 commented 3 years ago

Lambda users are using AWS, and it seems to make sense for AWS to make decisions on aspects of the user experience in some respects

I think it's fine to set defaults if that's sensible to implement, but the goal should be to decouple X-Ray/CloudWatch and OpenTelemetry. If I already use another APM vendor but then I deploy one of my apps to AWS Lambda, CloudWatch/X-Ray will not be helpful at all to me if I want to see a full trace that starts from my fat Java on Windows client, goes through AWS Lambda, to a Java web application hosted in my own datacenter and maybe further to some mainframe.

tedsuo commented 3 years ago

Trace headers are hard, and I understand if it will take AWS time to support headers other than x-amzn for services like SQS. But that is different from baking in assumptions.

It is hard to make Lambda repo to be vendor agnostic since Lambda is aws, take the example, we will use ADOT collector in Collector extension because there is no better choice.

I am using strong language here, so I want to be very clear: The moment a specific distro is required is the moment this has crossed a line. AWS is putting work into making a distro that performs well in a restricted environment. That is great, I appreciate the attention to detail and the effort involved there. However, it is also straight forwards to make this extension build with other Collectors. If the AWS distro only contains OSS/upstream components, there is no reason I could not choose to include a build with additional plugins, or remove plugins currently embedded in the AWS distro to further reduce the footprint. This is a critical requirement for the AWS work to not begin looking like a fork.

wangzlei commented 3 years ago

There are some basic principles we need to keep in mind for OTel community health, such as be neutral, vendor agnostic, etc. The goal of these principles are for the future of OTel but should not be the stumbling block to develop. I hope we can make the decision by analyzing the real case but not from principles without foundation. One real example is otel collector contrib, if we keep no vendor-specific, not allow any 3rd party exporter be joint in, otel would not get successful. When we involve these 3rd party components in Collector, we have broke the vendor-agnostic principle. But this break is neutral because it is fair, open to config, if the new component does not impact otel core. So, I would like introduce the story of OpenTelemetry Lambda project:

OTel Lambda got started from the end of last year, we changed our thought 3 times. At the very beginning we thought it is like AWS distro Collector, AWS should maintain it in aws owned downstream, everything was simple. At the beginning of this year, with the great help of @alolita and @codeboten, we were thinking about upstream the Lambda project to OTel community, we believe it is the good for both OTel and AWS, the only concern was if OTel community accepts it. At that time we still want to split it to up/down stream. The OTel upstream maintain the code, AWS downstream maintain sample and CI/CD. But things changes in Feb, we found if keep so called no vender-specific, it actually sacrifice user experience and the project itself. We hope the Otel Lambda public layers have better visibility, users have better onboard experience, then I raised the proposal in SIG meeting:

These items might scare people because it sounds very vendor specific,But it is not, I want to explain a little bit here:

  1. In Lambda SIG meeting we found it is not fair if AWS distribute public Lambda layer in owned downstream repo, that would drive cooperators move to AWS Repo but not a neutral OTel repo.
  2. Using AWS distro Collector but not Collector-contrib because that is the only technical choice in Lambda, not because we intends not use colletor-contrib.
  3. OTel Lambda is config-open. any backend needs to be configured by Collector config file and env variables. AWS native backends are non-privileged.
  4. We have to add a sample using AWS native backends(XRay and CloudWatch), that is the best way for user demo and CI in Lambda. At the meanwhile I hope every partner can add their sample into this repo, not only limit to AWS native services.

Generally, OpenTelemetry Lambda is not a vendor-specific problem, but how can we make the right decision for users and make sure be neutral. Lambda naturally is an AWS service, it is not a sin, just hard to get rid of intimacy with AWS :) Please refer the design of OpenTelemetry Lambda then help make decision if we should remove opentelemetry-lambda repo from OTel community. https://docs.google.com/document/d/1-rVvaXckulIZFHRmGwjs_QfbWQbCOmY3HkLLQTNIA2c/edit#

Further more, personally I don't like AWS distro Collector, it forcibly separates Collector to upstream and AWS downstream, introduces additional work effort on both side, and asks users use custom version colletor in AWS environment. Even so, as I know the motivation of AWS distro Collector is not for vendor-specific but AWS does not want to endorses the original Collector due to security concern. Hence AWS distro Collector is not a positive example for OTel Lambda project.

anuraaga commented 3 years ago

This is my first warning. Stop trying to privatize OpenTelemetry. Rethink how you are approaching this.

@tedsuo Definitely not trying to privatize anything, so brought this up to get advice on how best to make sure Lambda users get good tracing without losing the vendor neutrality of this org. Happy for any advice :)

We discussed at the SIG about what steps we can take and want to confirm if this would be ok in terms of vendor neutrality.

These points mean that we don't use any AWS distros in the lambda components, only official OTel artifacts. The vendor-specific points would only be the sample apps, but we hope to be vendor neutral by having sample apps for all vendors. This allows Lambda users to find all the information they would need in one place.

Does this fit in with the goals of OTel?

tedsuo commented 3 years ago

Thank you for your responses. To be clear, I don't think there is any foul play happening, I just want to emphasize there is an important line here. We are still in the process of creating OpenTelemetry, and the shape it takes now will influence both future work and perception of the project in the eyes of the general public.

@anuraaga what you are proposing looks fine, as long as it is hosted upstream and includes the build tools and instructions needed for publishing alternative language and collector layers, should a user wish to do so. 👍

There are some basic principles we need to keep in mind for OTel community health, such as be neutral, vendor agnostic, etc. The goal of these principles are for the future of OTel but should not be the stumbling block to develop. I hope we can make the decision by analyzing the real case but not from principles without foundation. One real example is otel collector contrib, if we keep no vendor-specific, not allow any 3rd party exporter be joint in, otel would not get successful. When we involve these 3rd party components in Collector, we have broke the vendor-agnostic principle. But this break is neutral because it is fair, open to config, if the new component does not impact otel core.

This is not quite right, and it is relevant to this discussion, so I want to add some clarity here. OTel is vendor agnostic because we allow anyone to add a plugin. Saying "no Xray, no Stackdriver allowed" is not how we intend to make this project vendor neutral. The point of OpenTelemetry is that you can plug in what you want. We do not force users to conform to our standards, such as OTLP, just because we think they are good. The real world problem we solve to allow operators choice. Saying "you have to use OTLP" is not agnostic. Being able to convert data from any format, to any other format, is an important part of the value we provide. We ensure a clean separation of concerns between every component in order to ensure that lock in is minimal. Use the API with the SDK, or with any implementation if the SDK is an issue. Use the collector if it is helpful, or run without it if you don't want the overhead. Generate OTLP through your own mechanism if you don't want to use any of our code.

That doesn't mean an operator has to use the "kitchen sink" distro of the Collector. Actually the opposite. A user should be able to make a build of the collector containing any plugins they like, including plugins that they write themselves.

If we start gluing these pieces together, so that now running OpenTelemetry on Lambda means running with a specific distro of the Collector, we limit the ability for the operator to make their own choice. If, instead, we ensure that users can build a collector layer with a collector binary of their choice, then we give them the ability to move around any roadblock we may have inadvertently put in their path.

The image size issue is a real world example of where a user would want choice. Lightstep uses OTLP; we are covered by the stripped down distro you intend to make. But what about Datadog users? Zipkin users? The list of requested exporters will grow and grow. Likewise, processors will be added as metrics and other data types are added. If you stick to one distro, you will either end up back at the kitchen sink distro, or telling some users they are not as well supported as others.

The correct collector distro for lambda is the one that contains exactly the plugins you need. This will be different for different users. Using zipkin? You don't need OTLP, Xray, or the rest. Just the Zipkin exporter. @anuraaga providing a stripped down collector as the default is definitely useful and important, we just need to be able to accommodate users who need these other plugins.

This is the shift in thinking I am requesting. Obviously, there is more work to be done around supplying users with build tools for the Collector. We will also need a copyright scheme to differentiate "sanctioned" Collector distros built out of certified and trusted plugins, vs "mystery meat" distros which are running who know what. But we don't need to be blocked on building those tools first. We just need to expect that the ecosystem will eventually look like this, and keep that in mind when we approach projects like Lambda and Kubernetes.

wangzlei commented 3 years ago

We had discussion in today SIG meeting, now we are all on the same page. The route is no big difference with our proposal before, thanks @anuraaga pointing out things we need to be aware, please refer his latest summary in this thread.

Regarding @tedsuo comment, that is one of decision we made in previous SIG meeting: By default, we provide users a stripped down Collector in OTel public Lambda layer, any component can be added into it if maintainers think it is valuable and would not inflate the Layer size. Meanwhile, we provide tools for building custom OTel Lambda layer(by Collector builder and OTel Lambda CD script).

Good to see many opinions from different perspective here. As a reference point for other FaaS providers, OTel Lambda project is young, we have few developers and would hit a wall many times. Hopefully we can have more helpers involve in discussion and PR review.

FYI Lambda SIG meeting notes and Design, I would also prepare the CI/CD proposal in near future.

alolita commented 3 years ago

Hi @tedsuo We don't plan to "privatize" anything. AWS Distro for OpenTelemetry is a completely downstream distribution of OpenTelemetry. As proposed earlier, we plan to work with and in the community, on the project for all Lambda extensions. Please check with me to get details before making assumptions.

@anuraaga - please make sure to indicate these are your personal opinions and not representing AWS in this case.

anuraaga commented 3 years ago

please make sure to indicate these are your personal opinions and not representing AWS in this case.

These are my opinions based on my open-telemetry hat, not AWS one. Though I do hope both hats align in this case.

tedsuo commented 3 years ago

@alolita rereading my comments, I want to apologize for the overly hard language and assumptions. Throwing accusations around is not appropriate, and I could have made my point without that. There is a public perception issue which I feel needs to be addressed, but that is not the same as intent. I trust you and I don't believe that there is a genuine attempt to back a fork.

Apologies.

alolita commented 3 years ago

Thanks @tedsuo Let's work together on ensuring there is no confusion in public perception.

alolita commented 3 years ago

This SIG has been created. Closing issue.

Oberon00 commented 3 years ago

This SIG has been created.

@alolita Really? It is not listed at https://github.com/open-telemetry/community/#special-interest-groups

anuraaga commented 3 years ago

Phantom SIG :)