Open mukund-ananthu opened 1 month ago
@mukund-ananthu let me know if this solve your problem
From the description provided, the issue you encountered relates to the non-deterministic behavior of spans exported during the execution of the unit tests, particularly in the context of asynchronous function calls. This can lead to inconsistencies between local test runs and CI (Continuous Integration) presubmits on GitHub.
You are dealing with an asynchronous scenario where:
The expectation is that spans 1 and 2 are finished and appear in the exported spans, while spans 3 and 4 should not. However, in the presubmit runs, you're observing that sometimes spans 3 and 4 appear, and the order of spans might change.
Race Conditions: Asynchronous executions can lead to race conditions, especially if there are timing discrepancies in how the spans are started and ended. The execution order of a() and b() may not be guaranteed, particularly in an environment with concurrent executions.
Event Loop Behavior: The Python event loop may schedule asynchronous tasks in a non-deterministic manner, particularly under different loads or environments (e.g., local vs. CI environments). This can lead to differences in the order and timing of span completions.
Test Isolation: If the tests are not properly isolated, shared state or resources may cause interference, leading to inconsistent results. This is particularly relevant in CI systems where multiple tests might run concurrently.
Synchronization:
Explicit Waits:
Deterministic Testing:
Mocking:
CI Configuration:
You expect the spans exported to be consistent across all runs. Specifically, spans 1 and 2 should always be present and finished, while spans 3 and 4 should not appear in the assertions.
The spans exported show non-deterministic behavior, with spans 3 and 4 occasionally appearing and the order of spans varying between runs.
By following these recommendations and adjustments, you should be able to achieve deterministic results in your tests, enhancing the reliability of the span assertions.
To start off with, your response mentions the following details:
Environment Description
Operating System: Ubuntu 20.04 LTS
Python Version: Python 3.8.10
SDK Version: OpenTelemetry Python SDK 1.9.0
API Version: OpenTelemetry API 1.9.0
which were never mentioned in my issue description
@mukund-ananthu let me know if this solve your problem
Environment Description
- Operating System: Ubuntu 20.04 LTS
- Python Version: Python 3.8.10
- SDK Version: OpenTelemetry Python SDK 1.9.0
- API Version: OpenTelemetry API 1.9.0
Issue Overview
From the description provided, the issue you encountered relates to the non-deterministic behavior of spans exported during the execution of the unit tests, particularly in the context of asynchronous function calls. This can lead to inconsistencies between local test runs and CI (Continuous Integration) presubmits on GitHub.
What Happened?
You are dealing with an asynchronous scenario where:
- Function a() starts and ends spans 1 and 2 as expected.
- It asynchronously calls function b(), which may not complete in the expected order.
- In b(), span 3 is ended and span 4 is started and ended.
The expectation is that spans 1 and 2 are finished and appear in the exported spans, while spans 3 and 4 should not. However, in the presubmit runs, you're observing that sometimes spans 3 and 4 appear, and the order of spans might change.
Why This Could Be Happening
- Race Conditions: Asynchronous executions can lead to race conditions, especially if there are timing discrepancies in how the spans are started and ended. The execution order of a() and b() may not be guaranteed, particularly in an environment with concurrent executions.
- Event Loop Behavior: The Python event loop may schedule asynchronous tasks in a non-deterministic manner, particularly under different loads or environments (e.g., local vs. CI environments). This can lead to differences in the order and timing of span completions.
- Test Isolation: If the tests are not properly isolated, shared state or resources may cause interference, leading to inconsistent results. This is particularly relevant in CI systems where multiple tests might run concurrently.
Recommended Ways to Handle/Test This Scenario
Synchronization:
- Use synchronization mechanisms (like asyncio.Event or asyncio.Lock) to control the execution order of asynchronous tasks. Ensure that b() completes before asserting the spans in a().
Explicit Waits:
- Incorporate explicit waits or checks to ensure that spans are finished before asserting. This can be achieved by awaiting the completion of b() in a().
Deterministic Testing:
- Refactor the tests to ensure that all asynchronous operations are completed before making assertions. Utilize asyncio.gather() to wait for multiple coroutines to finish if applicable.
Mocking:
- Consider mocking the span creation and exporting process to isolate the behavior of your functions from the actual OpenTelemetry SDK. This can help ensure that your tests focus solely on the logic without the variability of the SDK's behavior.
CI Configuration:
- Investigate the CI environment configuration for differences in resource availability or execution settings that might affect timing behaviors.
Steps to Reproduce
- Clone the repository and check out the specific branch.
- Run the pytest suite locally multiple times to observe consistent results.
- Observe the results in the presubmit checks on GitHub to identify the non-deterministic behavior.
Expected Result
You expect the spans exported to be consistent across all runs. Specifically, spans 1 and 2 should always be present and finished, while spans 3 and 4 should not appear in the assertions.
Actual Result
The spans exported show non-deterministic behavior, with spans 3 and 4 occasionally appearing and the order of spans varying between runs.
By following these recommendations and adjustments, you should be able to achieve deterministic results in your tests, enhancing the reliability of the span assertions.
Describe your environment
OS: (e.g, Ubuntu) Python version: (e.g., Python 3.8.10) SDK version: (e.g., 1.25.0) API version: (e.g., 1.25.0)
What happened?
https://github.com/googleapis/python-pubsub/pull/1254/files -> This was an interim fix I put in for the time being. But the issue that was observed was that both the number and order of spans exported to the pytests would differ across runs. This was not reproducible locally when run on several hundred runs, but would show up in the presubmits occasionally. The exact code could be looked at in the linked issue above, but at a high level:
What I expected in the pytest unit tests for a() is that spans 1 and 2 show up as finished spans. And spans 3 and 4 don't show up. This was the case in every local execution of the test. But when the presubmits run on Github, I see that some times, the asserted spans also include spans 3 and 4, and the ordering of spans 1,2,3,4 also vary.
Steps to Reproduce
Mentioned above
Expected Result
Spans exported and hence assertable in the pytest should be deterministic
Actual Result
Spans exported are non-deterministic.
Additional context
No response
Would you like to implement a fix?
None