Open gauravg11 opened 4 years ago
This might be because some episodes aren't finished when on_postprocess_traj is called, so you are computing rewards on partial episodes instead. Try setting the "batch_mode": "complete_episodes" config, which will force complete trajectories to be generated.
While this definitely made a large effect towards bringing the metrics together, there's still some discrepancy as far as order of magnitude goes.
Result for A3C_cleanup_env_0:
custom_metrics:
totals_max: -155
totals_mean: -948.6603773584906
totals_min: -1787
date: 2020-03-12_17-45-43
done: false
episode_len_mean: 1000.0
episode_reward_max: -1882.0
episode_reward_mean: -5024.641509433963
episode_reward_min: -8642.0
episodes_this_iter: 106
episodes_total: 106
...
timesteps_since_restore: 100000
timesteps_this_iter: 100000
timesteps_total: 100000
training_iteration: 1
I have a custom environment where the total reward is the sum of intrinsic reward and environmental reward.
I've configured the environment to emit the reward breakdowns as:
info = {'agent0' : {'intrinsic' : X, 'environmental' : Y} ... }
and then defining a custom callback as below
However I'm struggling with getting my custom metrics to match up with metrics such as
episode_reward_mean
. What would be the right way here to record reward breakdowns?