[question] SAC and VecFrameStack

kosmylo commented 4 years ago

I am trying different RL agents in a custom environment to check their behavior. First I tried PPO2 together with VecFrameStack and everything worked out fine and I got a very reasonable policy. Then I wanted to try SAC, but I can run it only if I do not use VecFrameStack, because otherwise, I am getting an error.

The code used to initiate the training is the following:

import os
import read_params
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

from environment import ChargingStation

from stable_baselines.sac.policies import MlpPolicy, LnMlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize, VecFrameStack
from stable_baselines.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines import SAC
from stable_baselines.bench import Monitor
from stable_baselines import results_plotter
from stable_baselines.common.schedules import LinearSchedule

params, profiles = read_params.Charging_Station_Params()

# Create unique log dir
log_dir = "/tmp/sac/"
os.makedirs(log_dir, exist_ok = True)

env = ChargingStation()
env = Monitor(env, log_dir, allow_early_resets = True)
env = DummyVecEnv([lambda: env])

# Automatically normalize the input features and rewards and stack the previous observations
env = VecNormalize(env, norm_obs = True, norm_reward = True, clip_obs = 10.)
env = VecFrameStack(env, n_stack = params.number_frames)

# the noise objects for SAC
n_actions = env.action_space.shape[-1]
action_noise = OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=float(0.5) * np.ones(n_actions))

# Custom MLP policy 
policy_kwargs = dict(act_fun = tf.nn.leaky_relu, layers = [256, 256, 256])
buffer_size = 100000
gamma = 0.999

model = SAC(MlpPolicy, env, gamma = gamma, policy_kwargs = policy_kwargs, buffer_size = buffer_size, verbose = 1, action_noise = action_noise, tensorboard_log= log_dir + "/sac_ev_charging_tensorboard/")

model.learn(total_timesteps = params.time_steps)

# Don't forget to save the VecNormalize statistics when saving the agent
model.save(log_dir + "sac_ev_charging")
env.save(os.path.join(log_dir, "vec_normalize.pkl"))

# Plot learning curve
results_plotter.plot_results([log_dir], params.time_steps, results_plotter.X_TIMESTEPS, "SAC ChargingStation")
plt.show()

The error that I am getting is related to tensor dimensions:

Traceback (most recent call last):

  File "C:\Users\train_sac.py", line 43, in <module>
    model.learn(total_timesteps = params.time_steps)

  File "C:\Users\stable_baselines\sac\sac.py", line 462, in learn
    mb_infos_vals.append(self._train_step(step, writer, current_lr))

  File "C:\Users\stable_baselines\sac\sac.py", line 337, in _train_step
    out = self.sess.run([self.summary] + self.step_ops, feed_dict)

  File "C:\Users\AppData\Local\Continuum\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\client\session.py", line 950, in run
    run_metadata_ptr)

  File "C:\Users\AppData\Local\Continuum\anaconda3\envs\tf_gpu\lib\site-packages\tensorflow\python\client\session.py", line 1149, in _run
    str(subfeed_t.get_shape())))

ValueError: Cannot feed value of shape (64, 42) for Tensor 'input/input/Ob:0', which has shape '(?, 210)'

System Info

Describe how the library was installed: PIP
GPU models and configuration: NVIDIA GeForce GTX 1060
Python version: 3.7.7
Tensorflow version: 1.14.0
Versions of any other relevant libraries: name: tf_gpu channels:
- defaults dependencies:
- _tflow_select=2.1.0=gpu
- absl-py=0.9.0=py37_0
- alabaster=0.7.12=py37_0
- argh=0.26.2=py37_0
- astor=0.8.0=py37_0
- astroid=2.4.1=py37_0
- atomicwrites=1.4.0=py_0
- attrs=19.3.0=py_0
- autopep8=1.4.4=py_0
- babel=2.8.0=py_0
- backcall=0.1.0=py37_0
- bcrypt=3.1.7=py37he774522_0
- blas=1.0=mkl
- bleach=3.1.4=py_0
- ca-certificates=2020.1.1=0
- certifi=2020.4.5.1=py37_0
- cffi=1.14.0=py37h7a1dbc1_0
- chardet=3.0.4=py37_1003
- cloudpickle=1.4.1=py_0
- colorama=0.4.3=py_0
- cryptography=2.9.2=py37h7a1dbc1_0
- cudatoolkit=10.0.130=0
- cudnn=7.6.5=cuda10.0_0
- cycler=0.10.0=py37_0
- decorator=4.4.2=py_0
- defusedxml=0.6.0=py_0
- diff-match-patch=20181111=py_0
- docutils=0.16=py37_0
- entrypoints=0.3=py37_0
- flake8=3.7.9=py37_0
- freetype=2.9.1=ha9979f8_1
- gast=0.3.3=py_0
- grpcio=1.27.2=py37h351948d_0
- h5py=2.10.0=py37h5e291fa_0
- hdf5=1.10.4=h7ebc959_0
- icc_rt=2019.0.0=h0cc432a_1
- icu=58.2=ha925a31_3
- idna=2.9=py_1
- imagesize=1.2.0=py_0
- importlib-metadata=1.6.0=py37_0
- importlib_metadata=1.6.0=0
- intel-openmp=2020.1=216
- intervaltree=3.0.2=py_0
- ipykernel=5.1.4=py37h39e3cac_0
- ipython=7.13.0=py37h5ca1d4c_0
- ipython_genutils=0.2.0=py37_0
- isort=4.3.21=py37_0
- jedi=0.15.2=py37_0
- jinja2=2.11.2=py_0
- joblib=0.13.2=py37_0
- jpeg=9b=hb83a4c4_2
- jsonschema=3.2.0=py37_0
- jupyter_client=6.1.3=py_0
- jupyter_core=4.6.3=py37_0
- keras-applications=1.0.8=py_0
- keras-preprocessing=1.1.0=py_1
- keyring=21.1.1=py37_2
- kiwisolver=1.2.0=py37h74a9793_0
- lazy-object-proxy=1.4.3=py37he774522_0
- libpng=1.6.37=h2a8f88b_0
- libprotobuf=3.11.4=h7bd577a_0
- libsodium=1.0.16=h9d3ae62_0
- libspatialindex=1.9.3=h33f27b4_0
- markdown=3.1.1=py37_0
- markupsafe=1.1.1=py37he774522_0
- matplotlib=3.1.1=py37hc8f65d3_0
- mccabe=0.6.1=py37_1
- mistune=0.8.4=py37he774522_0
- mkl=2020.1=216
- mkl-service=2.3.0=py37hb782905_0
- mkl_fft=1.0.15=py37h14836fe_0
- mkl_random=1.1.1=py37h47e9c7a_0
- nbconvert=5.6.1=py37_0
- nbformat=5.0.6=py_0
- numpy=1.18.1=py37h93ca92e_0
- numpy-base=1.18.1=py37hc3f5095_1
- numpydoc=0.9.2=py_0
- openssl=1.1.1g=he774522_0
- packaging=20.3=py_0
- pandas=0.25.1=py37ha925a31_0
- pandoc=2.2.3.2=0
- pandocfilters=1.4.2=py37_1
- paramiko=2.7.1=py_0
- parso=0.5.2=py_0
- pathtools=0.1.2=py_1
- pexpect=4.8.0=py37_0
- pickleshare=0.7.5=py37_0
- pip=20.0.2=py37_3
- pluggy=0.13.1=py37_0
- prompt-toolkit=3.0.5=py_0
- prompt_toolkit=3.0.5=0
- protobuf=3.11.4=py37h33f27b4_0
- psutil=5.7.0=py37he774522_0
- pycodestyle=2.5.0=py37_0
- pycparser=2.20=py_0
- pydocstyle=4.0.1=py_0
- pyflakes=2.1.1=py37_0
- pygments=2.6.1=py_0
- pylint=2.5.2=py37_0
- pynacl=1.3.0=py37h62dcd97_0
- pyopenssl=19.1.0=py37_0
- pyparsing=2.4.7=py_0
- pyqt=5.9.2=py37h6538335_2
- pyreadline=2.1=py37_1
- pyrsistent=0.16.0=py37he774522_0
- pysocks=1.7.1=py37_0
- python=3.7.7=h81c818b_4
- python-dateutil=2.8.1=py_0
- python-jsonrpc-server=0.3.4=py_0
- python-language-server=0.31.10=py37_0
- pytz=2020.1=py_0
- pywin32=227=py37he774522_1
- pywin32-ctypes=0.2.0=py37_1000
- pyyaml=5.3.1=py37he774522_0
- pyzmq=18.1.1=py37ha925a31_0
- qdarkstyle=2.8.1=py_0
- qt=5.9.7=vc14h73c81de_0
- qtawesome=0.7.0=py_0
- qtconsole=4.7.4=py_0
- qtpy=1.9.0=py_0
- requests=2.23.0=py37_0
- rope=0.17.0=py_0
- rtree=0.9.4=py37h21ff451_1
- scikit-learn=0.22.1=py37h6288b17_0
- scipy=1.4.1=py37h9439919_0
- setuptools=47.1.1=py37_0
- sip=4.19.8=py37h6538335_0
- six=1.15.0=py_0
- snowballstemmer=2.0.0=py_0
- sortedcontainers=2.1.0=py37_0
- sphinx=3.0.4=py_0
- sphinxcontrib-applehelp=1.0.2=py_0
- sphinxcontrib-devhelp=1.0.2=py_0
- sphinxcontrib-htmlhelp=1.0.3=py_0
- sphinxcontrib-jsmath=1.0.1=py_0
- sphinxcontrib-qthelp=1.0.3=py_0
- sphinxcontrib-serializinghtml=1.1.4=py_0
- spyder=4.1.3=py37_0
- spyder-kernels=1.9.1=py37_0
- sqlite=3.31.1=h2a8f88b_1
- tensorboard=1.14.0=py37he3c9ec2_0
- tensorflow=1.14.0=gpu_py37h5512b17_0
- tensorflow-base=1.14.0=gpu_py37h55fc52a_0
- tensorflow-estimator=1.14.0=py_0
- tensorflow-gpu=1.14.0=h0d30ee6_0
- termcolor=1.1.0=py37_1
- testpath=0.4.4=py_0
- toml=0.10.0=py37h28b3542_0
- tornado=6.0.4=py37he774522_1
- traitlets=4.3.3=py37_0
- typed-ast=1.4.1=py37he774522_0
- ujson=1.35=py37hfa6e2cd_0
- urllib3=1.25.8=py37_0
- vc=14.1=h0510ff6_4
- vs2015_runtime=14.16.27012=hf0eaf9b_2
- watchdog=0.10.2=py37_0
- wcwidth=0.1.9=py_0
- webencodings=0.5.1=py37_1
- werkzeug=1.0.1=py_0
- wheel=0.34.2=py37_0
- win_inet_pton=1.1.0=py37_0
- wincertstore=0.2=py37_0
- wrapt=1.11.2=py37he774522_0
- yaml=0.1.7=hc54c509_2
- yapf=0.28.0=py_0
- zeromq=4.3.1=h33f27b4_3
- zipp=3.1.0=py_0
- zlib=1.2.11=h62dcd97_4
- pip:
- atari-py==0.2.6
- future==0.17.1
- gym==0.14.0
- opencv-python==4.1.1.26
- pillow==6.2.0
- pyglet==1.3.2
- stable-baselines==2.8.0 prefix: C:\Users\AppData\Local\Continuum\anaconda3\envs\tf_gpu

If I train it without VecFrameStack, it provides a reasonable policy. Could you maybe explain a bit what I should do to be able to train it together with VecFrameStack?

araffin commented 4 years ago

Hello,

Quick question: did you try without tensorboard logging?

EDIT: please also fill the issue template completely

kosmylo commented 4 years ago

Hello,

Quick question: did you try without tensorboard logging?

EDIT: please also fill the issue template completely

Yes, I tried without tensorboard and I still get the same error. I also updated the information.

araffin commented 4 years ago

I could reproduce the error with:

from stable_baselines import SAC
from stable_baselines.common.cmd_util import make_vec_env
from stable_baselines.common.vec_env import VecNormalize, VecFrameStack

env = make_vec_env('Pendulum-v0', n_envs=1)
# The following does not work:
# env = VecNormalize(env)
# env = VecFrameStack(env, 4)

# But this works:
env = VecFrameStack(env, 4)
env = VecNormalize(env)

model = SAC('MlpPolicy', env, verbose=1)
model.learn(10000)

but it works if you wrap first with VecFrameStack and then VecNormalize.

Anyway, with SAC, you normally don't have to normalize and you don't need external action noise.

kosmylo commented 4 years ago

Yes, I confirm that if you wrap first with VecFrameStack, there is no error.

hill-a / stable-baselines

[question] SAC and VecFrameStack #926