hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.17k stars 725 forks source link

[question] SAC and VecFrameStack #926

Closed kosmylo closed 4 years ago

kosmylo commented 4 years ago

I am trying different RL agents in a custom environment to check their behavior. First I tried PPO2 together with VecFrameStack and everything worked out fine and I got a very reasonable policy. Then I wanted to try SAC, but I can run it only if I do not use VecFrameStack, because otherwise, I am getting an error.

The code used to initiate the training is the following:

import os
import read_params
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

from environment import ChargingStation

from stable_baselines.sac.policies import MlpPolicy, LnMlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize, VecFrameStack
from stable_baselines.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines import SAC
from stable_baselines.bench import Monitor
from stable_baselines import results_plotter
from stable_baselines.common.schedules import LinearSchedule

params, profiles = read_params.Charging_Station_Params()

# Create unique log dir
log_dir = "/tmp/sac/"
os.makedirs(log_dir, exist_ok = True)

env = ChargingStation()
env = Monitor(env, log_dir, allow_early_resets = True)
env = DummyVecEnv([lambda: env])

# Automatically normalize the input features and rewards and stack the previous observations
env = VecNormalize(env, norm_obs = True, norm_reward = True, clip_obs = 10.)
env = VecFrameStack(env, n_stack = params.number_frames)

# the noise objects for SAC
n_actions = env.action_space.shape[-1]
action_noise = OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=float(0.5) * np.ones(n_actions))

# Custom MLP policy 
policy_kwargs = dict(act_fun = tf.nn.leaky_relu, layers = [256, 256, 256])
buffer_size = 100000
gamma = 0.999

model = SAC(MlpPolicy, env, gamma = gamma, policy_kwargs = policy_kwargs, buffer_size = buffer_size, verbose = 1, action_noise = action_noise, tensorboard_log= log_dir + "/sac_ev_charging_tensorboard/")

model.learn(total_timesteps = params.time_steps)

# Don't forget to save the VecNormalize statistics when saving the agent
model.save(log_dir + "sac_ev_charging")
env.save(os.path.join(log_dir, "vec_normalize.pkl"))

# Plot learning curve
results_plotter.plot_results([log_dir], params.time_steps, results_plotter.X_TIMESTEPS, "SAC ChargingStation")
plt.show()

The error that I am getting is related to tensor dimensions:

Traceback (most recent call last):

  File "C:\Users\train_sac.py", line 43, in <module>
    model.learn(total_timesteps = params.time_steps)

  File "C:\Users\stable_baselines\sac\sac.py", line 462, in learn
    mb_infos_vals.append(self._train_step(step, writer, current_lr))

  File "C:\Users\stable_baselines\sac\sac.py", line 337, in _train_step
    out = self.sess.run([self.summary] + self.step_ops, feed_dict)

  File "C:\Users\AppData\Local\Continuum\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)

  File "C:\Users\AppData\Local\Continuum\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1149, in _run
    str(subfeed_t.get_shape())))

ValueError: Cannot feed value of shape (64, 42) for Tensor 'input/input/Ob:0', which has shape '(?, 210)'

System Info

If I train it without VecFrameStack, it provides a reasonable policy. Could you maybe explain a bit what I should do to be able to train it together with VecFrameStack?

araffin commented 4 years ago

Hello,

Quick question: did you try without tensorboard logging?

EDIT: please also fill the issue template completely

kosmylo commented 4 years ago

Hello,

Quick question: did you try without tensorboard logging?

EDIT: please also fill the issue template completely

Yes, I tried without tensorboard and I still get the same error. I also updated the information.

araffin commented 4 years ago

I could reproduce the error with:

from stable_baselines import SAC
from stable_baselines.common.cmd_util import make_vec_env
from stable_baselines.common.vec_env import VecNormalize, VecFrameStack

env = make_vec_env('Pendulum-v0', n_envs=1)
# The following does not work:
# env = VecNormalize(env)
# env = VecFrameStack(env, 4)

# But this works:
env = VecFrameStack(env, 4)
env = VecNormalize(env)

model = SAC('MlpPolicy', env, verbose=1)
model.learn(10000)

but it works if you wrap first with VecFrameStack and then VecNormalize.

Anyway, with SAC, you normally don't have to normalize and you don't need external action noise.

kosmylo commented 4 years ago

Yes, I confirm that if you wrap first with VecFrameStack, there is no error.