Get "pybullet.error: Not connected to physics server." when running the environment in a data-parallel fashion

YY-GX commented 1 year ago

Hi all,

I'm now using the garage lib to run a reinforcement learning algorithm in a panda gym environment. However, when I use the RaySampler in Garage that can "sample episodes in a data-parallel fashion using a Ray cluster", I got this error: pybullet.error: Not connected to physics server. It's triggered by this line.

I guess this is a multi-process issue, could you help me with this? Thank you!

Here're more detailed bug log:

Traceback (most recent call last):
  File "/home/yygx/scripts/train_panda_airl.py", line 220, in <module>
    trainer.train(n_epochs=EPOCH_NUM, batch_size=10000)
  File "/home/yygx/src/garage/trainer.py", line 399, in train
    average_return = self._algo.train(self)
  File "/home/yygx/src/airl/irl_npo.py", line 187, in train
    trainer.step_episode = trainer.obtain_episodes(trainer.step_itr)  # yy: rollout episodes using the learned policy
  File "/home/yygx/src/garage/trainer.py", line 224, in obtain_episodes
    env_update=env_update)  # yy: generate episodes with learned policy
  File "/home/yygx/src/garage/sampler/ray_sampler.py", line 208, in obtain_samples
    ready_worker_id, episode_batch = ray.get(result)
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/ray/worker.py", line 1831, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(error): ray::SamplerWorker.rollout() (pid=241822, ip=192.168.86.22, repr=<garage.sampler.ray_sampler.SamplerWorker object at 0x7f3b17457e80>)
  File "/home/yygx/src/garage/sampler/ray_sampler.py", line 432, in rollout
    return (self.worker_id, self.inner_worker.rollout())
  File "/home/yygx/src/garage/tf/samplers/worker.py", line 115, in rollout
    return self._inner_worker.rollout()
  File "/home/yygx/src/garage/sampler/default_worker.py", line 186, in rollout
    self.start_episode()
  File "/home/yygx/src/garage/sampler/default_worker.py", line 97, in start_episode
    self._prev_obs, episode_info = self.env.reset()
  File "/home/yygx/src/garage/envs/gym_env.py", line 210, in reset
    first_obs = self._env.reset()
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/site-packages/gym/wrappers/time_limit.py", line 25, in reset
    return self.env.reset(**kwargs)
  File "/home/yygx/panda-gym/panda_gym/envs/core.py", line 250, in reset
    with self.sim.no_rendering():
  File "/home/yygx/anaconda3/envs/garage-ubuntu/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/home/yygx/panda-gym/panda_gym/pybullet.py", line 384, in no_rendering
    self.physics_client.configureDebugVisualizer(self.physics_client.COV_ENABLE_RENDERING, 0)
pybullet.error: Not connected to physics server.

YY-GX commented 1 year ago

Could anybody share any hints about what might cause this "pybullet.error: Not connected to physics server." issue? Thank you so much!

qgallouedec commented 1 year ago

Hi, I've tried to investigate this. Indeed, it seems to be an issue related to pybullet and multiprocessing. What version of panda-gym and garage do you use? Have you disabled rendering for learning? Could you try to provide a code example to reproduce the bug?

qgallouedec / panda-gym

Get "pybullet.error: Not connected to physics server." when running the environment in a data-parallel fashion #46