hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.17k stars 725 forks source link

[question] linear learning rate schedule in SAC #976

Closed kosmylo closed 4 years ago

kosmylo commented 4 years ago

Describe the bug I want to use learning rate schedule for training a SAC agent, but I cannot find the proper way to inform the algorithm about that. I am doing exactly the same as in PPO2 as follows:

Code example

import os
import read_params
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np

from environment import ChargingStation

from stable_baselines.sac.policies import MlpPolicy, LnMlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv, VecNormalize
from stable_baselines.common.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise
from stable_baselines import SAC
from stable_baselines.bench import Monitor
from stable_baselines import results_plotter
from stable_baselines.common.schedules import LinearSchedule

params, profiles = read_params.Charging_Station_Params()

# Create unique log dir
log_dir = "/tmp/sac/"
os.makedirs(log_dir, exist_ok = True)

env = ChargingStation()
env = Monitor(env, log_dir, allow_early_resets = True)
env = DummyVecEnv([lambda: env])

# Automatically normalize the input features and rewards and stack the previous observations
env = VecNormalize(env, norm_obs = True, norm_reward = True, clip_obs = 10.)

# the noise objects for SAC
n_actions = env.action_space.shape[-1]
action_noise = None 

# Custom MLP policy 
policy_kwargs = dict(act_fun = tf.nn.relu, layers = [128, 128])
buffer_size = 1000000
gamma = 0.99
sched_LR = LinearSchedule(params.time_steps, 0.005, 0.0001) # learning_rate = sched_LR.value

model = SAC(MlpPolicy, env, gamma = gamma, learning_rate = sched_LR.value, policy_kwargs = policy_kwargs, buffer_size = buffer_size, verbose = 1, action_noise = action_noise, tensorboard_log= log_dir + "/sac_ev_charging_tensorboard/")

model.learn(total_timesteps = params.time_steps)

# Don't forget to save the VecNormalize statistics when saving the agent
model.save(log_dir + "sac_ev_charging")
env.save(os.path.join(log_dir, "vec_normalize.pkl"))

# Plot learning curve
results_plotter.plot_results([log_dir], params.time_steps, results_plotter.X_TIMESTEPS, "SAC ChargingStation")
plt.show()

The problem is that the training starts with the lowest value for the learning rate, namely 0.0001 in this case.

What am I doing wrong?

araffin commented 4 years ago

Duplicate of #791 and #509

kosmylo commented 4 years ago

To sum up, I did what is mentioned in #509, but the learning rate doesn't take the values that I want. Specifically, I use the following commands:

1) sched_LR = LinearSchedule(params.time_steps, 0.005, 0.00025) for establishment of linear schedule from 0.005 to 0.00025 2) learning_rate = sched_LR.value as an argument in PPO2

But what I am getting as a learning rate schedule according to tensorboard is the following: test

The plot shows that the learning rate starts from 0.00025 and stays at this value.

araffin commented 4 years ago

Please look at how it is done in the rl zoo, and as mentioned in the doc, we recommend to use the rl zoo for best practices ;)

I also recommend you to give Stable-Baselines3 a try (as SB2 is in maintenance now), it has also a rl zoo: https://github.com/DLR-RM/rl-baselines3-zoo