Open cartersocha opened 1 year ago
👍
I would love seeing this getting some traction. There is plenty of tutorials out there that describe how you do serverless (AWS, Azur, GCP) with OpenTelemetry and each of them has their specific way of doing it, so it's super confusing for endusers what the right way of doing it is.
There is clear community guidance on how to monitor Lambdas
We started to have some initial documentation on serverless for JS on AWS in the official docs (https://opentelemetry.io/docs/instrumentation/js/serverless/), and plan to have more (e.g. here's a PR for GCP https://github.com/open-telemetry/opentelemetry.io/pull/2091 and a list where we track the effort (https://github.com/open-telemetry/opentelemetry.io/issues/2021)
We (Cisco, formerly Epsagon) are also interested in improving this area and willing to participate in the engineering efforts in JS and Python.
We already started to contribute the documentation (https://github.com/open-telemetry/opentelemetry.io/pull/1974).
@codeboten, @tylerbenson - please add any known lambda deficiencies into issues and link here for the final issue section or send to me over Slack.
👍
I would love seeing this getting some traction. There is plenty of tutorials out there that describe how you do serverless (AWS, Azur, GCP) with OpenTelemetry and each of them has their specific way of doing it, so it's super confusing for endusers what the right way of doing it is.
There is clear community guidance on how to monitor Lambdas
We started to have some initial documentation on serverless for JS on AWS in the official docs (https://opentelemetry.io/docs/instrumentation/js/serverless/), and plan to have more (e.g. here's a PR for GCP open-telemetry/opentelemetry.io#2091 and a list where we track the effort (open-telemetry/opentelemetry.io#2021)
Is this proposal and https://github.com/open-telemetry/opentelemetry-lambda repo related in any way?
@tigrannajaryan yep! The restarted SIG would take over the repo but not sure the approver / maintainer list. @Aneurysm9 will be overall lead
We want to ensure: Consistent lambda layer behavior by language and uniform conformance to the spec There is clear community guidance on how to monitor Lambdas End to end context propagation using community protocols / propagation across typical Lambda architectures (e.g async) Cold start data capture and submission The new TelemetryAPI is properly integrated and utilized by OpenTelemetry
I like that @cartersocha . One question the registry could also be a point of improvement ? when i search for aws in context of lambda in registry, i get confused to select the right instrumentation
E.g: regitry query by keyword
@cartersocha I think it would be good to rename this to "serverless" instead of "lambda" where possible.
@svrnm agreed there is a need to address stateless more broadly.
However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas. I’m not sure if the lambda repo even has an owner right now besides Anthony occasionally looking at it but customers are trying to use the packages in production.
A subsequent or parallel effort would be needed with function representatives from multiple vendors for the full stateless scope.
@cartersocha I created an issue describing the biggest issue I see: #3060
However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas
Although I agree that with putting a focus on lambda first, I think naming the WG / effort "serverless" is more inclusive, so it also invites ppl that want to pick up other kinds of faas & serverless into the work. It's similar to "Client Side Telemetry" which is not named "Browser RUM" or "Mobile RUM", although they have a strong focus on that matter.
However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas
Although I agree that with putting a focus on lambda first, I think naming the WG / effort "serverless" is more inclusive, so it also invites ppl that want to pick up other kinds of faas & serverless into the work. It's similar to "Client Side Telemetry" which is not named "Browser RUM" or "Mobile RUM", although they have a strong focus on that matter.
I agree with this, mostly. I think the WG should definitely be "FaaS" but may still encompass activities that are directly and solely targeted at Lambda. For instance, the current https://github.com/open-telemetry/opentelemetry-lambda repo provides a Lambda Extension and Layers that, to the best of my knowledge, do not have analogues in other FaaS offerings and that, should any such analogous offering exist, may still require independent implementations. We should seek to make specification as broad as possible but retain the flexibility to provide more focused implementations as necessary.
I believe FaaS is the appropriate scope as "serverless" is in the eye of the beholder.
Updated the item and issue title based on the feedback @Aneurysm9, @svrnm. Initial SIG focus will be on the fixing the current lambda state but the SIG will broadly cover the FAAS space.
Agreed on the spec being broadly applicable. I'm unsure if more spec work is needed for FAAS. Our initial findings have mostly been implementation inconsistencies
I suggest making request/response hooks a standard capability for serverless integrations (already implemented in Node, I've created a PR in Python - link). In several languages, request/response hooks are provided for HTTP clients and servers, allowing the extraction of data that isn't collected by default. A typical use case for that would be extracting a customer id from a header/body parameter, and adding it as an attribute to the span, to be later on used for filtering spans by customer. For that matter, serverless functions aren't much different than HTTP endpoints - and in some cases, like connecting AWS API Gateway to a Lambda, they are logically acting as HTTP servers.
Generally speaking, these hooks are an important enabler for adapting OTEL to each engineer organization's specific needs - and also help downstream distros innovate in various domains based on the core OTEL SDK.
Description
The initial goal of this project is to put Lambda monitoring into a consistently good state. Across vendors, customers consistently struggle to instrument their Lambdas and identify the best practices way to monitor a Lambda. Lambda layer behavior differs by language, context propagation is frequently broken, and cold starts are a known issue.
This SIG will also look beyond Lambdas and to more broadly Functions as a Service.
We want to ensure:
This SIG will also serve as a working group for all FAAS topics going forward.
Deliverables
The first 2 language implementations will be Node.JS and Python.
Staffing / Help Wanted
The following vendors are interested in improving this area.:
While Lambdas are the focus of this effort we need other Functions As A Service (FAAS) experts to ensure we're building conventions that make sense for the stateless function space in general. GCP or Azure participation would be welcome.
Required staffing
Project Lead: @Aneurysm9
Sponsoring TC Members:
Implementation Engineers:
Implementation Maintainers or Approvers:
Lambda SME(s): @Aneurysm9 to add
Meeting Times
To deliver the improvements promptly we propose meeting at least 2 days a week for the 6 week planning cycle as specified in the new Semantic Conventions Process Doc
Meeting Times:
PST Option: Tuesday @ 12 pm PST CET Option: Wednesday @ 7 am PST
Timeline
Labels
The tracking issue should be properly labeled to indicate what parts of the specification it is focused on.
Linked Issues and PRs
All PRs, Issues, and OTEPs related to the project should link back to the tracking issue, so that they can be easily found.
https://github.com/open-telemetry/community/issues/685 https://github.com/open-telemetry/opentelemetry-specification/issues/3060
Repo
https://github.com/open-telemetry/opentelemetry-lambda