Open durantagre opened 2 years ago
Seems like you are confusing OpenAI baselines with stable-baselines. In stable-baselines, you can save and restore models with simple agent.save
and PPO.load
functions. Stable-baselines does not have support for loading OpenAI baselines agents with a single call.
Also, we recommend using stable-baselines3 as it is more actively supported.
Dear altruists, I am new at stable baselines and RL. I am trying to retrain my previously trained PPO1 model as like it will start learning from where it was left in the previous training. What I am trying to do is :
Also, I am unsure of whether it starts the training from the previous ending point or whether it started the training from the very beginning (when it trains correctly in a different setting). Can anyone please help me directing the way to verify it. One option may be printing the model parameters, but I am unsure of it.
rank = MPI.COMM_WORLD.Get_rank() myseed = seed + 1000 * rank if seed is not None else None set_global_seeds(myseed) env = Env()
env = Monitor(env, logger_path, allow_early_resets=True)
env.seed(seed) if reward_scale != 1.0: from baselines.common.retro_wrappers import RewardScaler
env = RewardScaler(env, reward_scale) return env
def train(num_timesteps, path=None):
from baselines.ppo1 import mlp_policy, pposgd_simple
sess = U.make_session(num_cpu=1) sess.enter()
def policy_fn(name, ob_space, ac_space): policy = mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space, hid_size=64, num_hid_layers=3) saver = tf.train.Saver() if path is not None: print("Tried to restore from ", path) U.initialize() saver.restore(tf.get_default_session(), path) saver2 = tf.train.import_meta_graph('/srcs/src/models/model1.meta') model = saver.restore(sess,tf.train.latest_checkpoint('/srcs/src/models/'))
return policy
env = make_env()
pi = pposgd_simple.learn(env, policy_fn, max_timesteps=num_timesteps, timesteps_per_actorbatch=1024, clip_param=0.2, entcoeff=0.0, optim_epochs=10, optim_stepsize=5e-5, optim_batchsize=64, gamma=0.99, lam=0.95, schedule='linear', tensorboard_log=logger_path,
tensorboard_log="./ppo1_tensorboard/",
) env.env.plotSave() saver = tf.train.Saver(tf.all_variables()) saver.save(sess, '/models/model1') return pi
def main(): logger.configure() path_ = "/models/model1" train(numtimesteps=409600, path=path) if name == 'main': rank = MPI.COMM_WORLD.Get_rank() logger_path = None if logger.get_dir() is None else os.path.join(logger.get_dir(), str(rank)) main()