hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.16k stars 725 forks source link

Problem retraining PPO1 model and using Tensorflow with Stable Baselines 2 #1154

Open durantagre opened 2 years ago

durantagre commented 2 years ago

Dear altruists, I am new at stable baselines and RL. I am trying to retrain my previously trained PPO1 model as like it will start learning from where it was left in the previous training. What I am trying to do is :

  1. loading my previously trained model from my computer and then re-train it from the point it ended it’s last training. For that, I am loading my previously saved model inside policy_fn() and I am giving policy_fn as parameter inside pposgd_simple.learn() method. It shows error "ValueError: At least two variables have the same name: pi/obfilter/count"

Also, I am unsure of whether it starts the training from the previous ending point or whether it started the training from the very beginning (when it trains correctly in a different setting). Can anyone please help me directing the way to verify it. One option may be printing the model parameters, but I am unsure of it.

  1. I am also trying to use Tensorboard to monitor my training. But when I run the training, the program says “tensorboard_log=logger_path, TypeError: learn() got an unexpected keyword argument 'tensorboard_log'.” My stable baselines version 2.10.2. I am attaching my entire code of training below. I would appreciate any suggestions from you. Thanks in advance.
    
    def make_env(seed=None):
    reward_scale = 1.0

rank = MPI.COMM_WORLD.Get_rank() myseed = seed + 1000 * rank if seed is not None else None set_global_seeds(myseed) env = Env()

env = Monitor(env, logger_path, allow_early_resets=True)

env.seed(seed) if reward_scale != 1.0: from baselines.common.retro_wrappers import RewardScaler

env = RewardScaler(env, reward_scale) return env

def train(num_timesteps, path=None):

from baselines.ppo1 import mlp_policy, pposgd_simple

sess = U.make_session(num_cpu=1) sess.enter()

def policy_fn(name, ob_space, ac_space): policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space, hid_size=64, num_hid_layers=3) saver = tf.train.Saver() if path is not None: print("Tried to restore from ", path) U.initialize() saver.restore(tf.get_default_session(), path) saver2 = tf.train.import_meta_graph('/srcs/src/models/model1.meta') model = saver.restore(sess,tf.train.latest_checkpoint('/srcs/src/models/'))

return policy

    return saver2

env = make_env()

pi = pposgd_simple.learn(env, policy_fn, max_timesteps=num_timesteps, timesteps_per_actorbatch=1024, clip_param=0.2, entcoeff=0.0, optim_epochs=10, optim_stepsize=5e-5, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', tensorboard_log=logger_path,

tensorboard_log="./ppo1_tensorboard/",

) env.env.plotSave() saver = tf.train.Saver(tf.all_variables()) saver.save(sess, '/models/model1') return pi

def main(): logger.configure() path_ = "/models/model1" train(numtimesteps=409600, path=path) if name == 'main': rank = MPI.COMM_WORLD.Get_rank() logger_path = None if logger.get_dir() is None else os.path.join(logger.get_dir(), str(rank)) main()

Miffyli commented 2 years ago

Seems like you are confusing OpenAI baselines with stable-baselines. In stable-baselines, you can save and restore models with simple agent.save and PPO.load functions. Stable-baselines does not have support for loading OpenAI baselines agents with a single call.

Also, we recommend using stable-baselines3 as it is more actively supported.