The test test_update_envs_env_update is flaky. It sometimes fails more than twice. The assertion on line 78 assert np.var(mean_rewards) > 0 fails. This PR addresses this issue.
To find a solution, I collected samples of np.var(mean_rewards) from several test executions and computed the tail distribution just to check how often can the value be 0.
Based on the collected samples, it seems there is ~12% chance that the test will fail. I suggest to run this test 3 times, then the probability of failure will be <1%.
I think refactoring this test using the statistical evaluation might be a good way to reduce the flakiness of this test.
Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions. Also, here I assume there are no bugs in the code under test.
Also, I am curious to know why do you check if var > 0? Statistically, zero variance can happen sometimes. Is it better to have something like assert np.mean(mean_rewards) > 0?
The test
test_update_envs_env_update
is flaky. It sometimes fails more than twice. The assertion on line 78assert np.var(mean_rewards) > 0
fails. This PR addresses this issue.To find a solution, I collected samples of
np.var(mean_rewards)
from several test executions and computed the tail distribution just to check how often can the value be 0.Based on the collected samples, it seems there is ~12% chance that the test will fail. I suggest to run this test 3 times, then the probability of failure will be <1%.
I think refactoring this test using the statistical evaluation might be a good way to reduce the flakiness of this test.
Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions. Also, here I assume there are no bugs in the code under test.
Also, I am curious to know why do you check if
var > 0
? Statistically, zero variance can happen sometimes. Is it better to have something likeassert np.mean(mean_rewards) > 0
?