Project Tracking: FAAS SIG (Lambdas)

cartersocha commented 1 year ago

Description

The initial goal of this project is to put Lambda monitoring into a consistently good state. Across vendors, customers consistently struggle to instrument their Lambdas and identify the best practices way to monitor a Lambda. Lambda layer behavior differs by language, context propagation is frequently broken, and cold starts are a known issue.

This SIG will also look beyond Lambdas and to more broadly Functions as a Service.

We want to ensure:

The FAAS spec has been reviewed, assessed to apply generically, and stabilized
Consistent lambda layer behavior by language and uniform conformance to the spec
There is clear community guidance on how to monitor Lambdas
End to end context propagation using community protocols / propagation across typical Lambda architectures (e.g async)
Cold start data capture and submission
The new TelemetryAPI is properly integrated and utilized by OpenTelemetry

This SIG will also serve as a working group for all FAAS topics going forward.

Deliverables

Stabilized FAAS semantic conventions
Updated Lambda Layer extensions that follow consistent trace propagation and have consistent behavior across languages.
Cold start processor for the collector

The first 2 language implementations will be Node.JS and Python.

Staffing / Help Wanted

The following vendors are interested in improving this area.:

Lightstep
AWS
Splunk (@tsloughter)
Cisco (@arbiv)
Honeycomb (@cartermp)

While Lambdas are the focus of this effort we need other Functions As A Service (FAAS) experts to ensure we're building conventions that make sense for the stateless function space in general. GCP or Azure participation would be welcome.

Required staffing

Project Lead: @Aneurysm9

Sponsoring TC Members:

@carlosalberto
TBA

Implementation Engineers:

@tylerbenson
@codeboten
Cisco contributors (@arbiv)
Honeycomb contributors (@cartermp)
@tsloughter
@xoscar

Implementation Maintainers or Approvers:

JavaScript - @mwear
Python - @ocelotl

Lambda SME(s): @Aneurysm9 to add

Meeting Times

To deliver the improvements promptly we propose meeting at least 2 days a week for the 6 week planning cycle as specified in the new Semantic Conventions Process Doc

Meeting Times:

PST Option: Tuesday @ 12 pm PST CET Option: Wednesday @ 7 am PST

Timeline

New working group will be kicked off in January
The WG has 6 weeks to propose improvements to the specification and solutions - Beginning of March
OTeps and the first implementation in JavaScript will be reviewed by the community - All of March
Implementation - we want to start with JavaScript and Python as our first target implementation languages - Beginning of April

Labels

The tracking issue should be properly labeled to indicate what parts of the specification it is focused on.

Linked Issues and PRs

All PRs, Issues, and OTEPs related to the project should link back to the tracking issue, so that they can be easily found.

https://github.com/open-telemetry/community/issues/685 https://github.com/open-telemetry/opentelemetry-specification/issues/3060

Repo

https://github.com/open-telemetry/opentelemetry-lambda

svrnm commented 1 year ago

👍

I would love seeing this getting some traction. There is plenty of tutorials out there that describe how you do serverless (AWS, Azur, GCP) with OpenTelemetry and each of them has their specific way of doing it, so it's super confusing for endusers what the right way of doing it is.

There is clear community guidance on how to monitor Lambdas

We started to have some initial documentation on serverless for JS on AWS in the official docs (https://opentelemetry.io/docs/instrumentation/js/serverless/), and plan to have more (e.g. here's a PR for GCP https://github.com/open-telemetry/opentelemetry.io/pull/2091 and a list where we track the effort (https://github.com/open-telemetry/opentelemetry.io/issues/2021)

arbiv commented 1 year ago

We (Cisco, formerly Epsagon) are also interested in improving this area and willing to participate in the engineering efforts in JS and Python.

We already started to contribute the documentation (https://github.com/open-telemetry/opentelemetry.io/pull/1974).

cartersocha commented 1 year ago

@codeboten, @tylerbenson - please add any known lambda deficiencies into issues and link here for the final issue section or send to me over Slack.

cartersocha commented 1 year ago

👍

I would love seeing this getting some traction. There is plenty of tutorials out there that describe how you do serverless (AWS, Azur, GCP) with OpenTelemetry and each of them has their specific way of doing it, so it's super confusing for endusers what the right way of doing it is.

There is clear community guidance on how to monitor Lambdas

We started to have some initial documentation on serverless for JS on AWS in the official docs (https://opentelemetry.io/docs/instrumentation/js/serverless/), and plan to have more (e.g. here's a PR for GCP open-telemetry/opentelemetry.io#2091 and a list where we track the effort (open-telemetry/opentelemetry.io#2021)

@lmolkova if there's an interest in Azure Functions

tigrannajaryan commented 1 year ago

Is this proposal and https://github.com/open-telemetry/opentelemetry-lambda repo related in any way?

cartersocha commented 1 year ago

@tigrannajaryan yep! The restarted SIG would take over the repo but not sure the approver / maintainer list. @Aneurysm9 will be overall lead

niko-achilles commented 1 year ago

We want to ensure: Consistent lambda layer behavior by language and uniform conformance to the spec There is clear community guidance on how to monitor Lambdas End to end context propagation using community protocols / propagation across typical Lambda architectures (e.g async) Cold start data capture and submission The new TelemetryAPI is properly integrated and utilized by OpenTelemetry

I like that @cartersocha . One question the registry could also be a point of improvement ? when i search for aws in context of lambda in registry, i get confused to select the right instrumentation
E.g: regitry query by keyword

svrnm commented 1 year ago

@cartersocha I think it would be good to rename this to "serverless" instead of "lambda" where possible.

cartersocha commented 1 year ago

@svrnm agreed there is a need to address stateless more broadly.

However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas. I’m not sure if the lambda repo even has an owner right now besides Anthony occasionally looking at it but customers are trying to use the packages in production.

A subsequent or parallel effort would be needed with function representatives from multiple vendors for the full stateless scope.

tylerbenson commented 1 year ago

@cartersocha I created an issue describing the biggest issue I see: #3060

svrnm commented 1 year ago

However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas

Although I agree that with putting a focus on lambda first, I think naming the WG / effort "serverless" is more inclusive, so it also invites ppl that want to pick up other kinds of faas & serverless into the work. It's similar to "Client Side Telemetry" which is not named "Browser RUM" or "Mobile RUM", although they have a strong focus on that matter.

Aneurysm9 commented 1 year ago

However with the current state of lambda instrumentation, customer experience, and previous community support I’d like to keep this effort focused on lambdas

Although I agree that with putting a focus on lambda first, I think naming the WG / effort "serverless" is more inclusive, so it also invites ppl that want to pick up other kinds of faas & serverless into the work. It's similar to "Client Side Telemetry" which is not named "Browser RUM" or "Mobile RUM", although they have a strong focus on that matter.

I agree with this, mostly. I think the WG should definitely be "FaaS" but may still encompass activities that are directly and solely targeted at Lambda. For instance, the current https://github.com/open-telemetry/opentelemetry-lambda repo provides a Lambda Extension and Layers that, to the best of my knowledge, do not have analogues in other FaaS offerings and that, should any such analogous offering exist, may still require independent implementations. We should seek to make specification as broad as possible but retain the flexibility to provide more focused implementations as necessary.

I believe FaaS is the appropriate scope as "serverless" is in the eye of the beholder.

cartersocha commented 1 year ago

Updated the item and issue title based on the feedback @Aneurysm9, @svrnm. Initial SIG focus will be on the fixing the current lambda state but the SIG will broadly cover the FAAS space.

Agreed on the spec being broadly applicable. I'm unsure if more spec work is needed for FAAS. Our initial findings have mostly been implementation inconsistencies

nozik commented 1 year ago

I suggest making request/response hooks a standard capability for serverless integrations (already implemented in Node, I've created a PR in Python - link). In several languages, request/response hooks are provided for HTTP clients and servers, allowing the extraction of data that isn't collected by default. A typical use case for that would be extracting a customer id from a header/body parameter, and adding it as an attribute to the span, to be later on used for filtering spans by customer. For that matter, serverless functions aren't much different than HTTP endpoints - and in some cases, like connecting AWS API Gateway to a Lambda, they are logically acting as HTTP servers.

Generally speaking, these hooks are an important enabler for adapting OTEL to each engineer organization's specific needs - and also help downstream distros innovate in various domains based on the core OTEL SDK.

open-telemetry / opentelemetry-specification