ddpg learn eval/return and eval/Q should be mean?

keshaviyengar commented 5 years ago

In stable-baselines/ddpg/ddpg.py line 916 and 918 should the eval/return and eval/Q be np.mean to make scalar? # Evaluation statistics. if self.eval_env is not None: combined_stats['eval/return'] = eval_episode_rewards combined_stats['eval/return_history'] = np.mean(eval_episode_rewards_history) combined_stats['eval/Q'] = eval_qs combined_stats['eval/episodes'] = len(eval_episode_rewards)

araffin commented 5 years ago

Hello, Could you elaborate a bit more?

keshaviyengar commented 5 years ago

eval_qs is a numpy array from this line and needs to be scalar to pass the as_scalar check here
same as eval_episode_rewards. It is a numpy array here
If the eval_env is None then there is no issues
When eval_env is not None, the as_scalar function is not passed.

keshaviyengar commented 5 years ago

I have forked stable-baselines and made the changes here

araffin commented 5 years ago

Ah ok, this is a bug then. Please fill the issue template next time ;)

Code to reproduce the error:

import gym
from stable_baselines import DDPG

eval_env = gym.make("Pendulum-v0")
model = DDPG("MlpPolicy", "Pendulum-v0", eval_env=eval_env, verbose=1)
model.learn(50000)

Feel free to submit a PR then ;) (and don't forget to look at the CONTRIBUTING.md guide before)

EDIT: traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-68aae70e24b4> in <module>()
      4 eval_env = gym.make("Pendulum-v0")
      5 model = DDPG("MlpPolicy", "Pendulum-v0", eval_env=eval_env, verbose=1)
----> 6 model.learn(50000)

~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in learn(self, total_timesteps, callback, seed, log_interval, tb_log_name, reset_num_timesteps)
    941 
    942                     combined_stats_sums = MPI.COMM_WORLD.allreduce(
--> 943                         np.array([as_scalar(x) for x in combined_stats.values()]))
    944                     combined_stats = {k: v / mpi_size for (k, v) in zip(combined_stats.keys(), combined_stats_sums)}
    945 

~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in <listcomp>(.0)
    941 
    942                     combined_stats_sums = MPI.COMM_WORLD.allreduce(
--> 943                         np.array([as_scalar(x) for x in combined_stats.values()]))
    944                     combined_stats = {k: v / mpi_size for (k, v) in zip(combined_stats.keys(), combined_stats_sums)}
    945 

~/Documents/stable-baselines/stable_baselines/ddpg/ddpg.py in as_scalar(scalar)
    938                             return scalar
    939                         else:
--> 940                             raise ValueError('expected scalar, got %s' % scalar)
    941 
    942                     combined_stats_sums = MPI.COMM_WORLD.allreduce(

ValueError: expected scalar, got [...]

keshaviyengar commented 5 years ago

Ah sorry just saw that issue template, I'll take a look at CONTRIBUTING.md and submit one correctly.

hill-a / stable-baselines

ddpg learn eval/return and eval/Q should be mean? #225