open-telemetry / opentelemetry-rust

The Rust OpenTelemetry implementation
https://opentelemetry.io
Apache License 2.0
1.82k stars 424 forks source link

Add Testing for Shutdown Scenarios. #1514

Open hdost opened 8 months ago

hdost commented 8 months ago
          > > And will calling shutdown on the global loggerProvider ensure that the logs through these loggers are not exported ?

It would be good to validate this use-case. It seems shutdown_logger_provider will invoke shutdown on the LogProcessors when the inner instance is dropped. And now if any of the loggers have it's reference in separate thread, the shutdown will not be invoked. One option can be to add a new LoggerProvider::Shutdown() method, and explicitly invoke it from within shutdown_logger_provider . This new method will invoke shutdown on all processors, and so ensure that all the existing logs are flushed. And no new logs can be exported with processors in shutdown state.

Thinking about this scenario again, it seems that the potential issue of leaking providers might not be as significant if we can document it clearly to explain the risk. Specifically, providers could be leaked if any of the tasks indefinitely retain their reference, which can result in a shutdown not triggering.

Originally posted by @lalitb in https://github.com/open-telemetry/opentelemetry-rust/issues/1455#issuecomment-1877690239

bIgBV commented 7 months ago

Looking into this, I'd love some clarification on how the scenario would work. My current approach has been to set up an integration test based on https://github.com/open-telemetry/opentelemetry-rust/blob/main/examples/logs-basic/src/main.rs and extend it such that we have multiple background tasks with set up with references to the LoggerProvider instance through an Arc.

The shutdown I wanted to test was to create a forced shutdown through an interrupt. I initially tried this with setting up multiple exporters with a single LoggerProvider and got a panic from this line https://github.com/open-telemetry/opentelemetry-rust/blob/36c2a8e0a3c7077b824af78119bfa72ce31a6cd6/opentelemetry-sdk/src/logs/log_emitter.rs#L86 which I think is expected behavior.

Would taking the same approach with multiple LoggerProvider references in separate tasks be a similar scenario?

hdost commented 7 months ago

The shutdown I wanted to test was to create a forced shutdown through an interrupt. I initially tried this with setting up multiple exporters with a single LoggerProvider and got a panic from this line https://github.com/open-telemetry/opentelemetry-rust/blob/36c2a8e0a3c7077b824af78119bfa72ce31a6cd6/opentelemetry-sdk/src/logs/log_emitter.rs#L86 which I think is expected behavior.

Would taking the same approach with multiple LoggerProvider references in separate tasks be a similar scenario?

Looking at our API and the spec I actually think we might need a change. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/sdk.md#shutdown

Right now i think this panic isn't actually doing what I would want/expect. There's already a result being returned. In this case it's a vec of results but I feel like we should be passing back a default of Vec error result.

Shutdown SHOULD provide a way to let the caller know whether it succeeded, failed or timed out.

Panics don't really "let the caller know".

cijothomas commented 4 months ago

Tests are added now and we should be good to close this. https://github.com/open-telemetry/opentelemetry-rust/issues/1042#issuecomment-2109009422 is still required to be resolved, but that is separate (though shutdown related!)

cijothomas commented 4 months ago

Reopening, as few topics were discussed here, but not really resolved.. (Will check how to expand the tests to cover everything.)