hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.1k stars 727 forks source link

Pre-Training Problem #932

Open FabioPINO opened 4 years ago

FabioPINO commented 4 years ago

When I try to run the code below I get this error at the pretrain function: Error

  File "C:\Users\fabio\Desktop\wetransfer-08d028\Rope_ex_v1.5\RL_Training\behaviour_cloning.py", line 40, in <module>
    model.pretrain(dataset, n_epochs=1000)

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\common\base_class.py", line 346, in pretrain
    expert_obs, expert_actions = dataset.get_next_batch('train')

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 152, in get_next_batch
    dataloader.start_process()

  File "c:\users\fabio\desktop\virtual_env\env1\lib\site-packages\stable_baselines\gail\dataset\dataset.py", line 231, in start_process
    self.process.start()

  File "C:\Python\Python37\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)

  File "C:\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)

  File "C:\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

PicklingError: Can't pickle <function rebuild_pipe_connection at 0x0000024B185C2168>: it's not the same object as multiprocessing.connection.rebuild_pipe_connection

Code

import gym
from stable_baselines.gail import generate_expert_traj
env = gym.make("CartPole-v1")
def dummy_expert(_obs):
    return env.action_space.sample()

generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)

from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset

dataset = ExpertDataset(expert_path='expert_cartpole.npz',
                        traj_limitation=1, batch_size=128)

model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
model.pretrain(dataset, n_epochs=1000)
model.learn(int(1e5))

env = model.get_env()
obs = env.reset()

reward_sum = 0.0
for _ in range(1000):
        action, _ = model.predict(obs)
        obs, reward, done, _ = env.step(action)
        reward_sum += reward
        env.render()
        if done:
                print(reward_sum)
                reward_sum = 0.0
                obs = env.reset()

env.close()
Miffyli commented 4 years ago

Please check the issue template and fill necessary parts, and also paste the full traceback of the exception and place the code into a code-block like ``` this ```

FabioPINO commented 4 years ago

I am sorry. I hope the description of the issue is more clear now.

Miffyli commented 4 years ago

The full traceback of all processes tells the issue: It is trying to load file expert_cartpole.npz when creating ExpertDataset, but data is stored in dummy_expert_cartpole.npz. Fixing this fixes the issue.

FabioPINO commented 4 years ago

Sorry, my fault again. It was a typo, the name of the file is correct. Additionally, if I use the following code to generate the expert trajectories I obtain the same error:

Code

from stable_baselines import DQN
from stable_baselines.gail import generate_expert_traj

model = DQN('MlpPolicy', 'CartPole-v1', verbose=1)
      # Train a DQN agent for 1e5 timesteps and generate 10 trajectories
      # data will be saved in a numpy archive named `expert_cartpole.npz`
generate_expert_traj(model, 'expert_cartpole', n_timesteps=int(1e5), n_episodes=10)
Miffyli commented 4 years ago

For me the code runs as expected (Ubuntu 18.04, Python 3.6, stable-baselines 2.10) once I fixed the filenames. You need to study the full traceback printed by the code. The one you pasted is only a side-effect of multiple processes running.

FabioPINO commented 4 years ago

I am using (windows10, Python 3.7.6, stable-baseline 2.10). Ok I will try to investigate more thoroughly. I managed to fix the problem somehow! If I open a new console in spyder it works fine. I have another issue at the moment, is it normal that the behaviour cloning process takes so long for the code above? It has been running for 15' now.

FabioPINO commented 4 years ago

I think I found a bug. If I run the pretrain function with the sequential parameter of Dataloader object set on False, so using multiprocessing to manage data extraction, the program get stuck in these lines of code inside dataset.py:

            try:
                val = self.queue.get_nowait()
                break
            except queue.Empty:
                time.sleep(0.001)
                continue

Instead if I do not use subprocesses to process data, sequential = True, everything works fine.

Additionally, if I am using the multiprocessing mode to process data and I interrupt the loop presented above with ctrl-c, when I try to run again the code I get the error reported in the original question. I found that a workaround for this issue is to open a new console.

araffin commented 4 years ago

Did you try putting your code in a if __name__ == "__main__": section (cf doc https://stable-baselines.readthedocs.io/en/master/guide/vec_envs.html) This is required to use multiprocessing on windows.

FabioPINO commented 4 years ago

I tried but I get the same error as in the main question.

import gym

from stable_baselines.gail import generate_expert_traj
from stable_baselines import PPO2
from stable_baselines.gail import ExpertDataset

if __name__ == "__main__":

    env = gym.make("CartPole-v1")
    # Here the expert is a random agent
    # but it can be any python function, e.g. a PID controller
    def dummy_expert(_obs):
        """
        Random agent. It samples actions randomly
        from the action space of the environment.

        :param _obs: (np.ndarray) Current observation
        :return: (np.ndarray) action taken by the expert
        """
        return env.action_space.sample()
    # Data will be saved in a numpy archive named `expert_cartpole.npz`
    # when using something different than an RL expert,
    # you must pass the environment object explicitly
    generate_expert_traj(dummy_expert, 'expert_cartpole', env, n_episodes=10)

    # Using only one expert trajectory
    # you can specify `traj_limitation=-1` for using the whole dataset
    dataset = ExpertDataset(expert_path='expert_cartpole.npz',
                            traj_limitation=1, batch_size=128)

    model = PPO2('MlpPolicy', 'CartPole-v1', verbose=1)
    # Pretrain the PPO2 model
    model.pretrain(dataset, n_epochs=1)

    # As an option, you can train the RL agent
    # model.learn(int(1e5))

    # Test the pre-trained model
    env = model.get_env()
    obs = env.reset()

    reward_sum = 0.0
    for _ in range(1000):
            action, _ = model.predict(obs)
            obs, reward, done, _ = env.step(action)
            reward_sum += reward
            env.render()
            if done:
                    print(reward_sum)
                    reward_sum = 0.0
                    obs = env.reset()

    env.close()