matrix-org / waterfall

A cascading stream forwarding unit for scalable, distributed voice and video conferencing over Matrix
Apache License 2.0
98 stars 5 forks source link

Add OpenTelemetry instrumentation #141

Closed daniel-abramov closed 1 year ago

daniel-abramov commented 1 year ago

Relates to https://github.com/matrix-org/waterfall/issues/50

UPD. We actually decided that we won't replace the logging in favor of telemetry. Telemetry and logging can co-exist and it's encouraged to use both. Logging has a couple of advantages: they allow us to provide more information and can be separately processed while applying log-level filtering. I.e. it's not a problem to turn on debug logs and get lots of things logged as long as the logging framework is efficient, but using debug-level logs for telemetry is not optimal and may produce large unreadable traces.

The idea is to introduce telemetry while preserving logs in the first step. But since many telemetry events duplicate info-level logs, the next step would be to adapt our logging framework so that e.g. info level events produce telemetry span events.

Thus, we would use telemetry to create and manage spans and their context, while relying upon logs for logging and for producing telemetry span events.