Test flakes due to reusuable metric GC requirement

jack-berg commented 9 months ago

@asafm we've had a couple of test flakes around the the reusuable metric GC test. WDYT think about relaxing the GC reduction requirement?

InstrumentGarbageCollectionBenchmarkTest > normalizedAllocationRateTest() FAILED java.lang.AssertionError: [Aggregation temporality = DELTA, testInstrumentType = DOUBLE_LAST_VALUE] Expecting actual: 94.63418122969631 to be close to: 97.30000305175781 by less than 2.0 but difference was 2.6658218220615. (a difference of exactly 2.0 being considered valid) at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.lambda$normalizedAllocationRateTest$2(InstrumentGarbageCollectionBenchmarkTest.java:123) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.lambda$normalizedAllocationRateTest$3(InstrumentGarbageCollectionBenchmarkTest.java:101) at java.base/java.util.HashMap.forEach(HashMap.java:1337) at io.opentelemetry.sdk.metrics.internal.state.InstrumentGarbageCollectionBenchmarkTest.normalizedAllocationRateTest(InstrumentGarbageCollectionBenchmarkTest.java:93)

jack-berg commented 9 months ago

I've had more trouble with these test flakes today. The last value aggregation seems to be particularly prone to non-deterministic behavior. Will need to look into the failures closer to determine if there's anything on the metrics SDK side we can to do to make the GC allocation more reliable, or whether we should relax the test constraints.

I also think we don't need to run these tests on every entry of the test matrix. Similar to how we only run code coverage on the ubuntu / java LTS entry, we should be able to only run these expensive perf tests on a single entry as well.

asafm commented 9 months ago

I'm starting to work on this now

asafm commented 9 months ago

I couldn't recreate the issue locally after inspecting it and trying to run it with RepeatedTest. I'm okay with increasing the offset for those 4 types of tests since the ability to refactor them by mistake is very constrained; hence, any change there will likely cause the allocation rate to go even lower than 93.7%.

I created https://github.com/open-telemetry/opentelemetry-java/pull/6221 to fix both issues mentioned: lower rate and make the test run only in one matrix option.

open-telemetry / opentelemetry-java

Test flakes due to reusuable metric GC requirement #6211