Open ryanjulian opened 6 years ago
@ryanjulian for case 2, what are we considering the "expected final reward with the expected rise time" ?
@CatherineSue can you help explain?
Are there other "high qualities implementations" other than rlkit, baselines, tf_agents. These do not have implementations of ERWR, REPS, TNPG, VPG.
I think that list is basically my universe of "high quality implementations". you might want to check out stable-baselines
The list you made is of more "classic" RL algorithms which explains the lack of other implementations. We should be happy to duplicate others' results from papers for those.
For case 2, if there are no high-quality implementations we can compare to, we can compare to the performance in the papers for those algorithms. Usually, there will be experiments and their final reward metrics in the papers.
So I think the clear path to closing this is similar to our strategy for examples -- We can't run each example in full on the CI path, but we can ensure that each algorithm has a relevant benchmark script, and that script can be executed (for perhaps a single epoch).
This ensures that those benchmarks scripts can always be run out-of-band from the CI, even when code changes.
We need a long-running benchmark test for every algorithm. It should run on the hardest problems that algorithm can solve in <1M timesteps. In most cases, the Mujoco1M suite from OpenAI gym is appropriate.
2 cases: