Open hdost opened 8 months ago
Looking into this, I'd love some clarification on how the scenario would work. My current approach has been to set up an integration test based on https://github.com/open-telemetry/opentelemetry-rust/blob/main/examples/logs-basic/src/main.rs and extend it such that we have multiple background tasks with set up with references to the LoggerProvider
instance through an Arc
.
The shutdown I wanted to test was to create a forced shutdown through an interrupt. I initially tried this with setting up multiple exporters with a single LoggerProvider
and got a panic from this line https://github.com/open-telemetry/opentelemetry-rust/blob/36c2a8e0a3c7077b824af78119bfa72ce31a6cd6/opentelemetry-sdk/src/logs/log_emitter.rs#L86 which I think is expected behavior.
Would taking the same approach with multiple LoggerProvider
references in separate tasks be a similar scenario?
The shutdown I wanted to test was to create a forced shutdown through an interrupt. I initially tried this with setting up multiple exporters with a single
LoggerProvider
and got a panic from this line https://github.com/open-telemetry/opentelemetry-rust/blob/36c2a8e0a3c7077b824af78119bfa72ce31a6cd6/opentelemetry-sdk/src/logs/log_emitter.rs#L86 which I think is expected behavior.Would taking the same approach with multiple
LoggerProvider
references in separate tasks be a similar scenario?
Looking at our API and the spec I actually think we might need a change. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#shutdown
Right now i think this panic isn't actually doing what I would want/expect. There's already a result being returned. In this case it's a vec of results but I feel like we should be passing back a default of Vec
error result.
Shutdown SHOULD provide a way to let the caller know whether it succeeded, failed or timed out.
Panics don't really "let the caller know".
Tests are added now and we should be good to close this. https://github.com/open-telemetry/opentelemetry-rust/issues/1042#issuecomment-2109009422 is still required to be resolved, but that is separate (though shutdown related!)
Reopening, as few topics were discussed here, but not really resolved.. (Will check how to expand the tests to cover everything.)
Thinking about this scenario again, it seems that the potential issue of leaking providers might not be as significant if we can document it clearly to explain the risk. Specifically, providers could be leaked if any of the tasks indefinitely retain their reference, which can result in a shutdown not triggering.
Originally posted by @lalitb in https://github.com/open-telemetry/opentelemetry-rust/issues/1455#issuecomment-1877690239