Open FabioPINO opened 4 years ago
Please check the issue template and fill necessary parts, and also paste the full traceback of the exception and place the code into a code-block like ``` this ```
I am sorry. I hope the description of the issue is more clear now.
The full traceback of all processes tells the issue: It is trying to load file expert_cartpole.npz
when creating ExpertDataset, but data is stored in dummy_expert_cartpole.npz
. Fixing this fixes the issue.
Sorry, my fault again. It was a typo, the name of the file is correct. Additionally, if I use the following code to generate the expert trajectories I obtain the same error:
Code
from stable_baselines import DQN
from stable_baselines.gail import generate_expert_traj
model = DQN('MlpPolicy', 'CartPole-v1', verbose=1)
# Train a DQN agent for 1e5 timesteps and generate 10 trajectories
# data will be saved in a numpy archive named `expert_cartpole.npz`
generate_expert_traj(model, 'expert_cartpole', n_timesteps=int(1e5), n_episodes=10)
For me the code runs as expected (Ubuntu 18.04, Python 3.6, stable-baselines 2.10) once I fixed the filenames. You need to study the full traceback printed by the code. The one you pasted is only a side-effect of multiple processes running.
I am using (windows10, Python 3.7.6, stable-baseline 2.10). Ok I will try to investigate more thoroughly. I managed to fix the problem somehow! If I open a new console in spyder it works fine. I have another issue at the moment, is it normal that the behaviour cloning process takes so long for the code above? It has been running for 15' now.
I think I found a bug. If I run the pretrain function with the sequential parameter of Dataloader object set on False, so using multiprocessing to manage data extraction, the program get stuck in these lines of code inside dataset.py:
try:
val = self.queue.get_nowait()
break
except queue.Empty:
time.sleep(0.001)
continue
Instead if I do not use subprocesses to process data, sequential = True, everything works fine.
Additionally, if I am using the multiprocessing mode to process data and I interrupt the loop presented above with ctrl-c, when I try to run again the code I get the error reported in the original question. I found that a workaround for this issue is to open a new console.
Did you try putting your code in a if __name__ == "__main__":
section (cf doc https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html)
This is required to use multiprocessing on windows.
I tried but I get the same error as in the main question.
import gym
from stable_baselines.gail import generate_expert_traj
from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset
if __name__ == "__main__":
env = gym.make("CartPole-v1")
# Here the expert is a random agent
# but it can be any python function, e.g. a PID controller
def dummy_expert(_obs):
"""
Random agent. It samples actions randomly
from the action space of the environment.
:param _obs: (np.ndarray) Current observation
:return: (np.ndarray) action taken by the expert
"""
return env.action_space.sample()
# Data will be saved in a numpy archive named `expert_cartpole.npz`
# when using something different than an RL expert,
# you must pass the environment object explicitly
generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)
# Using only one expert trajectory
# you can specify `traj_limitation=-1` for using the whole dataset
dataset = ExpertDataset(expert_path='expert_cartpole.npz',
traj_limitation=1, batch_size=128)
model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
# Pretrain the PPO2 model
model.pretrain(dataset, n_epochs=1)
# As an option, you can train the RL agent
# model.learn(int(1e5))
# Test the pre-trained model
env = model.get_env()
obs = env.reset()
reward_sum = 0.0
for _ in range(1000):
action, _ = model.predict(obs)
obs, reward, done, _ = env.step(action)
reward_sum += reward
env.render()
if done:
print(reward_sum)
reward_sum = 0.0
obs = env.reset()
env.close()
When I try to run the code below I get this error at the pretrain function: Error
Code