rlworkgroup / garage

A toolkit for reproducible reinforcement learning research.
MIT License
1.86k stars 309 forks source link

Fixing test flakiness #2242

Closed sleepy-owl closed 3 years ago

sleepy-owl commented 3 years ago

The test test_update_envs_env_update is flaky. It sometimes fails more than twice. The assertion on line 78 assert np.var(mean_rewards) > 0 fails. This PR addresses this issue.

To find a solution, I collected samples of np.var(mean_rewards) from several test executions and computed the tail distribution just to check how often can the value be 0.

Based on the collected samples, it seems there is ~12% chance that the test will fail. I suggest to run this test 3 times, then the probability of failure will be <1%.

I think refactoring this test using the statistical evaluation might be a good way to reduce the flakiness of this test.

Do you guys think this makes sense? Please let me know if this looks good or if you have any other suggestions. Also, here I assume there are no bugs in the code under test.

Also, I am curious to know why do you check if var > 0? Statistically, zero variance can happen sometimes. Is it better to have something like assert np.mean(mean_rewards) > 0?

ryanjulian commented 3 years ago

/ok-to-test

rlworkgroupbot commented 3 years ago

Command run output for https://github.com/rlworkgroup/garage/pull/2242/commits/35fd206546637c6d608c9a0234773429f4c3c883

rlworkgroupbot commented 3 years ago

Command run output for https://github.com/rlworkgroup/garage/pull/2242/commits/35fd206546637c6d608c9a0234773429f4c3c883

sleepy-owl commented 3 years ago

hi, it seems the checks failed due to some config issue?

ryanjulian commented 3 years ago

@ziyiwu9494 can you PTAL?

krzentner commented 3 years ago

/ok-to-test

rlworkgroupbot commented 3 years ago

Command run output for https://github.com/rlworkgroup/garage/pull/2242/commits/35fd206546637c6d608c9a0234773429f4c3c883

sleepy-owl commented 3 years ago

Hi all, thanks for approving the PR. Is there something blocking for merge?

avnishn commented 3 years ago

/ok-to-test

rlworkgroupbot commented 3 years ago

Command run output for https://github.com/rlworkgroup/garage/pull/2242/commits/e0f1501b7e3d8d7556aaf0e591b1724d9c629586