openai / mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
Other
2.87k stars 813 forks source link

I got a regular black pixel from sim.render with offscreen #450

Open kaixindelele opened 5 years ago

kaixindelele commented 5 years ago

Describe the bug No obvious errors are raised! My basic setup is normal and get normal pixels with offscreen unless I control the environment with DDPG algorithm. Oddly enough, usually about the first 200 episodes are normal and then the n-th step of a certain episode assumed that is M-th starts to appear black pixel(zeros numpy array) until the end of the episode. This is a rough description---

[self.reset()](https://github.com/StanfordVL/robosuite/blob/7b727b075607466662ba8fac8bf8634936bdc58d/robosuite/environments/base.py#L142)
0 episode 
   0  step  ------------normal
  ......  step------------normal
   200 step------------normal
self.reset()
..........
self.reset()
M episode 
    0  step  ------------normal
  ......  step------------normal
   n     step------------black
  ......  step------------black
   200 step------------black
M+1 episode
    0  step  ------------normal
   ......  step------------normal
   n     step------------black
   ......  step------------black
   200 step------------black

To Reproduce Desktop (please complete the following information): hardware: graphics card is nvidia 1080

OS: Linux version 4.15.0-58-generic (buildd@lgw01-amd64-037) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.10)) #64~16.04.1-Ubuntu SMP Wed Aug 7 14:10:35 UTC 2019 Python Version: 3.5.2 tensorflow 1.10.0
tensorflow-base 1.10.0
tensorflow-gpu 1.10.0 mujoco-py 1.50.1.68 cudatoolkit 9.2 cudnn 7.6.0

Environment export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/lyl/.mujoco/mjpro150/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/nvidia-415 export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so:/usr/lib/nvidia-415/libGL.so

Expected behavior I just want to avoid the black pixel, at least after the appearance, I can find a way to reset something back to normal.

Interestingly, when I go into debug mode after the black image occur, it triggers something unknown, and the next step doesn't appear black!

When I learn your source code about MjSim.render()

I found a thread Lock---

       if mode == 'offscreen':
            **with _MjSim_render_lock:**
                if self._render_context_offscreen is None:
                    render_context = MjRenderContextOffscreen(
                        self, device_id=device_id)
                else:
                    render_context = self._render_context_offscreen
                render_context.render(
                    width=width, height=height, camera_id=camera_id)
                return render_context.read_pixels(
                    width, height, depth=depth)

and I continuously found a line of extern C function:

   def read_pixels(self, width, height, depth=True):
        cdef mjrRect rect
        rect.left = 0
        rect.bottom = 0
        rect.width = width
        rect.height = height
        rgb_arr = np.zeros(3 * rect.width * rect.height, dtype=np.uint8)
        depth_arr = np.zeros(rect.width * rect.height, dtype=np.float32)
        cdef unsigned char[::view.contiguous] rgb_view = rgb_arr
        cdef float[::view.contiguous] depth_view = depth_arr
        mjr_readPixels(&rgb_view[0], &depth_view[0], rect, &self._con)
        rgb_img = rgb_arr.reshape(rect.height, rect.width, 3)
        if depth:
            depth_img = depth_arr.reshape(rect.height, rect.width)
            return (rgb_img, depth_img)
        else:
            return rgb_img

As can be seen from the output, the zero initialization of _rgbarr is executed, but the rendering of the extended c program is not performed!

My problems

  1. what could be causing the problem?
  2. Why does debug mode can alleviate this problem?
  3. How do I automatically handle this problem with Python's codes?

Additional context 1.I used the robosuite library based on mujoco-py. 2.the DDPG algorithm is based on tensorflow.

Look forward to your favourable reply, thanks!

kaixindelele commented 5 years ago

I can not upload the .py file, and so I upload the .txt files instead. And just add those files to robosuite/robosuite/ then run the run_robosuite_with_ddpg.py

ddpg.txt run_robosuite_with_ddpg.txt

kaixindelele commented 5 years ago

A more specific phenomenon: if I make a ipdb.set_trace() after the black images occur by a simple judgment. I will continue to get the black pixels until enter next episode start to execute env.reset(). next episode will be normal as the main program started.

self.reset() 0 episode 0 step ------------normal ...... step------------normal 200 step------------normal self.reset() .......... self.reset() M episode 0 step ------------normal ...... step------------normal n step------------black

if black_flag:
  ipdb.set_trace()
  continue

self.reset() ...... step------------normal 200 step------------normal

M+1 episode 0 step ------------normal ...... step------------normal n step------------normal ...... step------------normal 200 step------------normal ...... next one conflict.

So maybe the problem could be due to a multi-thread conflict?

kaixindelele commented 5 years ago

With my further exploration, I found that my DDPG update process will take about 5ms, but before the step of I got a black image , my DDPG update takes more than 150ms to update. I don't know if this is the cause of the black images, or some other factor that causes the black map and this delay. I really want to know, the rendering in mujoco_py, which resources, or thread are called, may conflict with TensorFlow. Looking forward to your reply!