temporalio / sdk-python

Temporal Python SDK
MIT License
473 stars 77 forks source link

[Feature Request] Support / provide guidance on using OpenTelemetry logging + metrics SDKs with process-pool workers #669

Open gregbrowndev opened 1 month ago

gregbrowndev commented 1 month ago

Is your feature request related to a problem? Please describe.

Slack discussion

Hi,

I've been rolling out an OpenTelemetry-based observability solution for my Temporal app. The reason for using OTel is partly due to the Temporal Python SDK already using OTel for traces and metrics (in the SDK), so I want to adopt those SDKs for custom metrics and logging.

Everything works great in async activities. I've been able to use OTel tooling for logging and tracing (using the TracingInterceptor as seen in this example), I can see traces with their correlated logs for each activity in my backend (Grafana, Loki, Tempo). The Temporal SDK provides activity.metric_meter(), which I've used to add custom metrics to async activities.

However, I'm having several issues with sync activities running on process-pool-based workers (I'm happy to split them into separate issues):

The trace_id and span_id injected into the logs are incorrect for all except the first activity that runs on that worker. It seems that the first activity's IDs are injected into all activities that follow it.

Note: I suspect initialising the MeterProvider for each process/activity will be a lot more simple because it isn't attached to a global root logger.

While these issues are likely inherently within the OTel SDKs, the same issues are also known to be true for the TracingProvider (ref), yet Temporal managed to get that to work.

Please provide guidance on setting up OTel logging and custom metrics in process-pool-based workers or support them natively like you do OTel tracing.

Describe the solution you'd like

Support for the remaining OpenTelemetry SDKs (metric and logging) natively in both async, thread-pool, and process-pool workers.