Closed rickstaa closed 4 years ago
I was, therefore, wondering if somebody has an example on how to use the HER algorithm with the master branch of the openai/baselines repository?
I use a bash script containing:
python3 -m baselines.run --alg=her --env=FetchReach-v1 --num_timesteps=5000
Example here:
@RyanRizzo96 Thanks a lot for the example I will take a look at it.
My problems were solved by changing to the stable_baselines fork instead.
As I was contacted by multiple people on how I got it to work with the stable_baselines fork here a quick guide:
Following the original documentation, I will use the following example script:
from stable_baselines import HER, DQN, SAC, DDPG, TD3
from stable_baselines.her import GoalSelectionStrategy, HERGoalEnvWrapper
from stable_baselines.common.bit_flipping_env import BitFlippingEnv
model_class = DQN # works also with SAC, DDPG and TD3
N_BITS = 2
env = BitFlippingEnv(
N_BITS, continuous=model_class in [DDPG, SAC, TD3], max_steps=N_BITS
)
# Available strategies (cf paper): future, final, episode, random
goal_selection_strategy = "future" # equivalent to GoalSelectionStrategy.FUTURE
# Wrap the model
model = HER(
"MlpPolicy",
env,
model_class,
n_sampled_goal=4,
goal_selection_strategy=goal_selection_strategy,
verbose=1,
)
# Train the model
model.learn(1000)
model.save("./her_bit_env")
# WARNING: you must pass an env
# or wrap your environment with HERGoalEnvWrapper to use the predict method
model = HER.load("./her_bit_env", env=env)
obs = env.reset()
for _ in range(100):
action, _ = model.predict(obs)
obs, reward, done, _ = env.step(action)
if done:
obs = env.reset()
git clone https://github.com/hill-a/stable-baselines.git
conda create -n her_test python=3.7
conda activate her_test
pip install .[mpi]
:heavy_exclamation_mark: NOTE: The mpi tag is essential to get it to work since otherwise, you will receive the error explained in this issue when you try to run the
I did not check the aforementioned steps in other python versions. In other versions, problems still might occur. If anybody still runs into problems while using this guide feel free to contact me.
I am currently trying to use the HER algorithm for training a
fetch
robot. I do this using thetrain.py
script below (see this repository for the full code).Unfortunately, I when running the
train.py
file I get the following error message:The following two solutions, that can be used to get rid of this error, were given in #798 by @pzhokhov:
These solutions, however, did not solve the problem for me (see the report of each solution below).
Option 1: Revert back to the old version 146bbf886ba533fe08b07e01d1c0356aaf7fcc80:
I now run into the following error, when running the
train.py
file:As I would rather add modifications to my train.py script than buying an additional mujoco license, I tried the second solution.
Option 2: Initiate environment before initializing RolloutWorker
I therefore tried adding the following code before the RolloutWorker initiation:
but when running the
train_modified.py
script I now receive the following error:I presume this is caused since the DummyEnv class is used instead of my own environment. As a result, I tried the following code:
But this also gave me the error above. Finally, I tried inputting a normal
gym.env
instead of a vectorized env by using the following code:But when doing this, I receive the following error:
I was, therefore, wondering if somebody has an example on how to use the HER algorithm with the master branch of the openai/baselines repository?