open-telemetry / opentelemetry-python

OpenTelemetry Python API and SDK
https://opentelemetry.io
Apache License 2.0
1.67k stars 569 forks source link

sdk: make test_batch_span_processor_scheduled_delay a bit more robust #3938

Closed xrmx closed 2 weeks ago

xrmx commented 1 month ago

Description

It happened that tests failed because the delay was fired some microseconds earlier:

>       self.assertGreaterEqual((export_time - start_time) * 1e3, 500)
E       AssertionError: 499.9737739562988 not greater than or equal to 500

We should probably revise all these skip on Windows Pypy once we have a Python 3.9 baseline and Pypy >= 7.3.12.

Fix #3911

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Does This PR Require a Contrib Repo Change?

Checklist:

aabmass commented 3 weeks ago

Ah I see you mentioned this in the issue @xrmx. That would be my preferred fix

xrmx commented 2 weeks ago

It happened that tests failed because the delay was fired some microseconds earlier:

>       self.assertGreaterEqual((export_time - start_time) * 1e3, 500)
E       AssertionError: 499.9737739562988 not greater than or equal to 500

For the purposes of this test, I think some microseconds early or late should be perfectly acceptable. Can we instead just update L489 to use assertAlmostEqual() with a reasonable delta?

The first version was:

self.assertGreaterEqual(round(export_time - start_time) * 1e3), 500)

but was asked to change it

aabmass commented 2 weeks ago

Thanks. @ocelotl I see your comment above. I imagine this test can be flaky on any platform since it depends on real world timing. IMO the PyPy behavior is working as intended. https://github.com/open-telemetry/opentelemetry-python/blob/ba22b165471bde2037620f2c850ab648a849fbc0/opentelemetry-sdk/tests/trace/export/test_export.py#L485

The other issue, if the actual time is way larger 500 the test passes. IMO assertAlmostEqual would be an improvement across the board. Wdyt?

xrmx commented 2 weeks ago

Updated following @aabmass suggestion

ocelotl commented 2 weeks ago

The other issue, if the actual time is way larger 500 the test passes.

I was thinking the same thing, approved.

ocelotl commented 2 weeks ago

Well, still not enough:

>       self.assertAlmostEqual((export_time - start_time) * 1e3, 500, places=0)
E       AssertionError: 500.6556510925293 != 500 within 0 places (0.6556510925292969 difference)
ocelotl commented 2 weeks ago

Ok, 2 things, in my opinion:

  1. This test has probably been failing randomly since forever. And since forever we have probably just been running it again until it passes.
  2. We now are ok with this test having some margin for error.

So:

  1. Let's just automate our manual rerunning of this test case.
  2. Maybe we are ok with even a greater margin for error for this test case. Let's use the greatest margin of error we are ok with. We probably would want to use the delta argument instead of places.
xrmx commented 2 weeks ago

me: Bumped the delta to 16ms as seen on CI

pypy: hold my beer: E AssertionError: 2253.103017807007 != 500 within 16 delta (1753.1030178070068 difference)

xrmx commented 2 weeks ago

Opened this for revising skipping tests on pypy / windows: https://github.com/open-telemetry/opentelemetry-python/issues/3967

Today I've added another commit in this PR noticing that in some failing pypy tests the MetricsTimeoutError was raised for the last collection. Since we are catching it in the loop I think we should catch it there too. You can find the stacktrace here:


opentelemetry-sdk/tests/metrics/test_periodic_exporting_metric_reader.py::TestPeriodicExportingMetricReader::test_metric_timeout_does_not_kill_worker_thread
  \_pytest\threadexception.py:73: PytestUnhandledThreadExceptionWarning: Exception in thread OtelPeriodicExportingMetricReader

  Traceback (most recent call last):
    File "C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\threading.py", line 1045, in _bootstrap_inner
      self.run()
    File "C:\hostedtoolcache\windows\Python\3.11.9\x64\Lib\threading.py", line 982, in run
      self._target(*self._args, **self._kwargs)
    File "D:\a\opentelemetry-python\opentelemetry-python\opentelemetry-sdk\src\opentelemetry\sdk\metrics\_internal\export\__init__.py", line 522, in _ticker
      self.collect(timeout_millis=self._export_interval_millis)
    File "D:\a\opentelemetry-python\opentelemetry-python\opentelemetry-sdk\tests\metrics\test_periodic_exporting_metric_reader.py", line 87, in collect
      raise self._collect_exception
  opentelemetry.sdk.metrics._internal.exceptions.MetricsTimeoutError: test timeout