Benchmark results calculation

takuseno / d3rlpy-benchmarks

Benchmark data for d3rlpy

MIT License

20 stars 5 forks source link

Benchmark results calculation #6

Closed lxqpku closed 4 months ago

lxqpku commented 1 year ago

What do final_return, final_std, best_return, best_std mean? How to calculate these results based on metrics imported in log file, e.g., evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}, Does final_return mean the final return obtained in the last epoch in one experiment or the average final return across several seeds?

takuseno commented 1 year ago

@lxqpku Hi, thanks for the issue. Each row was computed with experiments of 10 random seeds. Based on that, each metrics means:

final_return: average returns with last epoch policies averaged over 10 random seeds.
best_return: the best average returns averaged over 10 random seeds.

stds are standard deviation of 10 random seeds.