OpenTelemetry Spans exported during async code execution yield varying number of spans when asserting in pytest

mukund-ananthu commented 1 month ago

Describe your environment

OS: (e.g, Ubuntu) Python version: (e.g., Python 3.8.10) SDK version: (e.g., 1.25.0) API version: (e.g., 1.25.0)

What happened?

https://github.com/googleapis/python-pubsub/pull/1254/files -> This was an interim fix I put in for the time being. But the issue that was observed was that both the number and order of spans exported to the pytests would differ across runs. This was not reproducible locally when run on several hundred runs, but would show up in the presubmits occasionally. The exact code could be looked at in the linked issue above, but at a high level:

class Foo():

   def a():
      // start and end spans 1, 2 - so these are expected to be exported to pytests(and this is the case when the pytests are run locally
     // start span 3, but don't end it.
     asynchronously invoke b()

  // run async, but invoked by a()    
  def b():
     // end span 3
     // start and end span 4

What I expected in the pytest unit tests for a() is that spans 1 and 2 show up as finished spans. And spans 3 and 4 don't show up. This was the case in every local execution of the test. But when the presubmits run on Github, I see that some times, the asserted spans also include spans 3 and 4, and the ordering of spans 1,2,3,4 also vary.

Could you please let me know why this could be happening?
What's the recommended way to handle / test this scenario?

Steps to Reproduce

Mentioned above

Expected Result

Spans exported and hence assertable in the pytest should be deterministic

Actual Result

Spans exported are non-deterministic.

Additional context

No response

Would you like to implement a fix?

None

olabodeIdowu commented 2 weeks ago

@mukund-ananthu let me know if this solve your problem

Environment Description

Operating System: Ubuntu 20.04 LTS
Python Version: Python 3.8.10
SDK Version: OpenTelemetry Python SDK 1.9.0
API Version: OpenTelemetry API 1.9.0

Issue Overview

From the description provided, the issue you encountered relates to the non-deterministic behavior of spans exported during the execution of the unit tests, particularly in the context of asynchronous function calls. This can lead to inconsistencies between local test runs and CI (Continuous Integration) presubmits on GitHub.

What Happened?

You are dealing with an asynchronous scenario where:

Function a() starts and ends spans 1 and 2 as expected.
It asynchronously calls function b(), which may not complete in the expected order.
In b(), span 3 is ended and span 4 is started and ended.

The expectation is that spans 1 and 2 are finished and appear in the exported spans, while spans 3 and 4 should not. However, in the presubmit runs, you're observing that sometimes spans 3 and 4 appear, and the order of spans might change.

Why This Could Be Happening

Race Conditions: Asynchronous executions can lead to race conditions, especially if there are timing discrepancies in how the spans are started and ended. The execution order of a() and b() may not be guaranteed, particularly in an environment with concurrent executions.
Event Loop Behavior: The Python event loop may schedule asynchronous tasks in a non-deterministic manner, particularly under different loads or environments (e.g., local vs. CI environments). This can lead to differences in the order and timing of span completions.
Test Isolation: If the tests are not properly isolated, shared state or resources may cause interference, leading to inconsistent results. This is particularly relevant in CI systems where multiple tests might run concurrently.

Recommended Ways to Handle/Test This Scenario

Synchronization:
- Use synchronization mechanisms (like asyncio.Event or asyncio.Lock) to control the execution order of asynchronous tasks. Ensure that b() completes before asserting the spans in a().
Explicit Waits:
- Incorporate explicit waits or checks to ensure that spans are finished before asserting. This can be achieved by awaiting the completion of b() in a().
Deterministic Testing:
- Refactor the tests to ensure that all asynchronous operations are completed before making assertions. Utilize asyncio.gather() to wait for multiple coroutines to finish if applicable.
Mocking:
- Consider mocking the span creation and exporting process to isolate the behavior of your functions from the actual OpenTelemetry SDK. This can help ensure that your tests focus solely on the logic without the variability of the SDK's behavior.
CI Configuration:
- Investigate the CI environment configuration for differences in resource availability or execution settings that might affect timing behaviors.

Steps to Reproduce

Clone the repository and check out the specific branch.
Run the pytest suite locally multiple times to observe consistent results.
Observe the results in the presubmit checks on GitHub to identify the non-deterministic behavior.

Expected Result

You expect the spans exported to be consistent across all runs. Specifically, spans 1 and 2 should always be present and finished, while spans 3 and 4 should not appear in the assertions.

Actual Result

The spans exported show non-deterministic behavior, with spans 3 and 4 occasionally appearing and the order of spans varying between runs.

By following these recommendations and adjustments, you should be able to achieve deterministic results in your tests, enhancing the reliability of the span assertions.

mukund-ananthu commented 2 weeks ago

To start off with, your response mentions the following details:

Environment Description
Operating System: Ubuntu 20.04 LTS
Python Version: Python 3.8.10
SDK Version: OpenTelemetry Python SDK 1.9.0
API Version: OpenTelemetry API 1.9.0

which were never mentioned in my issue description

@mukund-ananthu let me know if this solve your problem

Environment Description

Operating System: Ubuntu 20.04 LTS

Python Version: Python 3.8.10

SDK Version: OpenTelemetry Python SDK 1.9.0

API Version: OpenTelemetry API 1.9.0

Issue Overview

From the description provided, the issue you encountered relates to the non-deterministic behavior of spans exported during the execution of the unit tests, particularly in the context of asynchronous function calls. This can lead to inconsistencies between local test runs and CI (Continuous Integration) presubmits on GitHub.

What Happened?

You are dealing with an asynchronous scenario where:

Function a() starts and ends spans 1 and 2 as expected.

It asynchronously calls function b(), which may not complete in the expected order.

In b(), span 3 is ended and span 4 is started and ended.

The expectation is that spans 1 and 2 are finished and appear in the exported spans, while spans 3 and 4 should not. However, in the presubmit runs, you're observing that sometimes spans 3 and 4 appear, and the order of spans might change.

Why This Could Be Happening

Race Conditions: Asynchronous executions can lead to race conditions, especially if there are timing discrepancies in how the spans are started and ended. The execution order of a() and b() may not be guaranteed, particularly in an environment with concurrent executions.

Event Loop Behavior: The Python event loop may schedule asynchronous tasks in a non-deterministic manner, particularly under different loads or environments (e.g., local vs. CI environments). This can lead to differences in the order and timing of span completions.

Test Isolation: If the tests are not properly isolated, shared state or resources may cause interference, leading to inconsistent results. This is particularly relevant in CI systems where multiple tests might run concurrently.

Recommended Ways to Handle/Test This Scenario

Synchronization:

Use synchronization mechanisms (like asyncio.Event or asyncio.Lock) to control the execution order of asynchronous tasks. Ensure that b() completes before asserting the spans in a().

Explicit Waits:

Incorporate explicit waits or checks to ensure that spans are finished before asserting. This can be achieved by awaiting the completion of b() in a().

Deterministic Testing:

Refactor the tests to ensure that all asynchronous operations are completed before making assertions. Utilize asyncio.gather() to wait for multiple coroutines to finish if applicable.

Mocking:

Consider mocking the span creation and exporting process to isolate the behavior of your functions from the actual OpenTelemetry SDK. This can help ensure that your tests focus solely on the logic without the variability of the SDK's behavior.

CI Configuration:

Investigate the CI environment configuration for differences in resource availability or execution settings that might affect timing behaviors.

Steps to Reproduce

Clone the repository and check out the specific branch.

Run the pytest suite locally multiple times to observe consistent results.

Observe the results in the presubmit checks on GitHub to identify the non-deterministic behavior.

Expected Result

You expect the spans exported to be consistent across all runs. Specifically, spans 1 and 2 should always be present and finished, while spans 3 and 4 should not appear in the assertions.

Actual Result

The spans exported show non-deterministic behavior, with spans 3 and 4 occasionally appearing and the order of spans varying between runs.

By following these recommendations and adjustments, you should be able to achieve deterministic results in your tests, enhancing the reliability of the span assertions.

open-telemetry / opentelemetry-python