Closed keshaviyengar closed 5 years ago
Hello, Could you elaborate a bit more?
I have forked stable-baselines and made the changes here
Ah ok, this is a bug then. Please fill the issue template next time ;)
Code to reproduce the error:
import gym
from stable_baselines import DDPG
eval_env = gym.make("Pendulum-v0")
model = DDPG("MlpPolicy", "Pendulum-v0", eval_env=eval_env, verbose=1)
model.learn(50000)
Feel free to submit a PR then ;) (and don't forget to look at the CONTRIBUTING.md guide before)
EDIT: traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-68aae70e24b4> in <module>()
4 eval_env = gym.make("Pendulum-v0")
5 model = DDPG("MlpPolicy", "Pendulum-v0", eval_env=eval_env, verbose=1)
----> 6 model.learn(50000)
~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in learn(self, total_timesteps, callback, seed, log_interval, tb_log_name, reset_num_timesteps)
941
942 combined_stats_sums = MPI.COMM_WORLD.allreduce(
--> 943 np.array([as_scalar(x) for x in combined_stats.values()]))
944 combined_stats = {k: v / mpi_size for (k, v) in zip(combined_stats.keys(), combined_stats_sums)}
945
~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in <listcomp>(.0)
941
942 combined_stats_sums = MPI.COMM_WORLD.allreduce(
--> 943 np.array([as_scalar(x) for x in combined_stats.values()]))
944 combined_stats = {k: v / mpi_size for (k, v) in zip(combined_stats.keys(), combined_stats_sums)}
945
~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in as_scalar(scalar)
938 return scalar
939 else:
--> 940 raise ValueError('expected scalar, got %s' % scalar)
941
942 combined_stats_sums = MPI.COMM_WORLD.allreduce(
ValueError: expected scalar, got [...]
Ah sorry just saw that issue template, I'll take a look at CONTRIBUTING.md and submit one correctly.
In stable-baselines/ddpg/ddpg.py line 916 and 918 should the eval/return and eval/Q be np.mean to make scalar?
# Evaluation statistics.
if self.eval_env is not None:
combined_stats['eval/return'] = eval_episode_rewards
combined_stats['eval/return_history'] = np.mean(eval_episode_rewards_history)
combined_stats['eval/Q'] = eval_qs
combined_stats['eval/episodes'] = len(eval_episode_rewards)