open-telemetry / opentelemetry-demo

This repository contains the OpenTelemetry Astronomy Shop, a microservice-based distributed system intended to illustrate the implementation of OpenTelemetry in a near real-world environment.
https://opentelemetry.io/docs/demo/
Apache License 2.0
1.72k stars 1.09k forks source link

Serverless support for OTEL demonstrations #1185

Open ithompson-gp opened 11 months ago

ithompson-gp commented 11 months ago

Feature Request

Throughput the OTEL landscape, and documentation, there is little in the way of use-case or working examples or indeed best-practice around leverage OTEL tooling and libraries in a serverless EDA (Event-Driven Architecture) or serverless workloads as a whole.

It would be really cool to have a best-practice setup given the performance requirements for serverless workloads; performance is king when running these workloads. There are considerable performance hits using AWS distribution for OTEL in Lambda for instance vs. having a centralised, load-balanced, Collector (where each serverless function has a thin layer for export and instrumentation). Is OTEL demo the right place for this? The lift to configure this container/k8s setup to serverless would be non-trivial.

Perhaps we would have a server and serverless OTEL demo?

For teams and organisations jumping on the OTEL bandwagon the lack of demonstrations and best-practice examples are a general hinderance (the only path are internal PoC to 'show-off' features yet, these are perhaps not adhering to best-practice).

Additional Context

There are many gaps in serverless OTEL - as we all can see/imagine - so perhaps a spin-off discussion would be around the direction of OTEL in a serverless lens more generally.

puckpuck commented 11 months ago

Do you think carving out a portion of the demo into serverless makes sense? In particular, the accountingservice and frauddetectionservice would be good candidates for event-driven serverless pieces. I can see a feature flag being used to direct traffic to the demo's Kafka service or publish it to a cloud provider queue (ie: SQS), which then gets consumed via serverless functions.

Configuring the demo in this mode would certainly add complexity. Still, for those looking to understand how to fit OpenTelemetry into their serverless stack, it would be significantly less effort than an internal proof of concept.

ithompson-gp commented 11 months ago

@puckpuck I agree on perhaps carving out a path with the existing demo, in that it would be best not to diverge too much (as this might exacerbate the situation currently with server vs. serverless and OTEL: serverless being an afterthought). The concern would be the level of complexity here... we'd need to consider a 'typical' serverless flow, and perhaps also consider that many PaaS vendors have their own flavour of 'serverless'.

Again, non-trivial to keep this as 'vanilla' yet useful. I'm hesitant to just say Lambda, SQS, etc. as there are other vendors than AWS in the space. There are also gaps within serverless and AWS and OTEL in regard to passing context (X-Ray is the hard-coded context).

As I write this, I am beginning to think there is an argument for PaaS vendor to supply some type of contribution. Perhaps AWS can provide a serverless flavour, Google, etc... and the overall OTEL community serverless leaders can provide the 'good practice'. It's important though we consider the performance element more closely in serverless; there is a need to ship lightweight components. Including a Collector with the serverless function runtime - for instance - is a non-starter (due to the hit on invocation times).

At a very basic level, invoking a FaaS (serverless) component would be useful, and aid demonstration of the tech, from the existing messaging element in the demo (Kafka). How this FaaS component is represented is another matter (as eluded to above).

austinlparker commented 7 months ago

I've opened a discussion involving this issue: https://github.com/open-telemetry/opentelemetry-demo/discussions/1389