rail-berkeley / serl

SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning
https://serl-robot.github.io/
MIT License
375 stars 42 forks source link

Randomly getting EGL_BAD_ACCESS #56

Open lakshitadodeja opened 5 months ago

lakshitadodeja commented 5 months ago

Hi,

I am using the franka sim environment provided by the repo and it is randomly throwing egl bad access error (sometimes it works and sometimes we have this error) -

OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_ACCESS,
        baseOperation = eglMakeCurrent,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x74b97d1d51c0>,
                <OpenGL._opaque.EGLSurface_pointer object at 0x74b97d14f140>,
                <OpenGL._opaque.EGLSurface_pointer object at 0x74b97d14f140>,
                <OpenGL._opaque.EGLContext_pointer object at 0x74b94429e9c0>,
        ),
        result = 0
)
4ku commented 5 months ago

Also have the similar problem:

bash run_actor.sh 
WARNING:absl:Type handler registry overriding type "<class 'float'>" collision on scalar
WARNING:absl:Type handler registry overriding type "<class 'bytes'>" collision on scalar
WARNING:absl:Type handler registry overriding type "<class 'numpy.number'>" collision on scalar
2024-06-10 14:31:52.415599: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
rlds logger is not installed, install it if required: https://github.com/rail-berkeley/oxe_envlogger 
/home/ivan/.local/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
libGL error: MESA-LOADER: failed to open swrast: /usr/lib/dri/swrast_dri.so: cannot open shared object file: No such file or directory (search paths /usr/lib/x86_64-linux-gnu/dri:\$${ORIGIN}/dri:/usr/lib/dri, suffix _dri)
libGL error: failed to load driver: swrast
/home/ivan/.local/lib/python3.10/site-packages/glfw/__init__.py:914: GLFWError: (65543) b'GLX: Failed to create context: BadValue (integer parameter out of range for operation)'
  warnings.warn(message, GLFWError)
python: /builds/florianrhiem/pyGLFW/glfw-3.3.9/src/window.c:646: glfwGetFramebufferSize: Assertion `window != ((void *)0)' failed.
Fatal Python error: Aborted
4ku commented 5 months ago

Fixed it with:

sudo apt-get update
sudo apt-get install libgl1-mesa-glx libgl1-mesa-dri
sudo apt-get install libglfw3 libglfw3-dev
conda update libstdcxx-ng
conda install -c conda-forge gcc=11
pip install pytz

So I have 3 FPS on my RTX 3070 Ti Laptop GPU.

But after filling up replay buffer have error with memory:

...
jaxlib.xla_extension.XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 1342219464 bytes.
...

UPD: I reduce batch size to 64 and now it works

zichunxx commented 2 months ago

Hi! @4ku I still get this error after following your solution. Any other suggestions? Thanks!

4ku commented 2 months ago

Hi! @4ku I still get this error after following your solution. Any other suggestions? Thanks!

Try reduce batch even more, for example 32 or 16

zichunxx commented 2 months ago

Thanks for your kind reply! @4ku

For me, this error is triggered by making the environment. Here is the error report:

  File "/home/xzc/Documents/serl/examples/async_drq_sim/async_drq_sim.py", line 412, in <module>
    app.run(main)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/home/xzc/Documents/serl/examples/async_drq_sim/async_drq_sim.py", line 405, in main
    actor(agent, data_store, env, sampling_rng)
  File "/home/xzc/Documents/serl/examples/async_drq_sim/async_drq_sim.py", line 106, in actor
    eval_env = gym.make(FLAGS.env)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gym/envs/registration.py", line 640, in make
    env = env_creator(**_kwargs)
  File "/home/xzc/Documents/serl/franka_sim/franka_sim/envs/panda_pick_gym_env.py", line 130, in __init__
    self._viewer.render(self.render_mode)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 646, in render
    viewer = self._get_viewer(render_mode=render_mode)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 686, in _get_viewer
    self.viewer = OffScreenViewer(self.model, self.data)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 144, in __init__
    super().__init__(model, data, width, height)
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 58, in __init__
    self.make_context_current()
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 185, in make_context_current
    self.opengl_context.make_current()
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/mujoco/egl/__init__.py", line 114, in make_current
    if not EGL.eglMakeCurrent(
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/OpenGL/platform/baseplatform.py", line 415, in __call__
    return self( *args, **named )
  File "/home/xzc/miniforge3/envs/serl/lib/python3.10/site-packages/OpenGL/error.py", line 230, in glCheckError
    raise self._errorClass(
OpenGL.raw.EGL._errors.EGLError: EGLError(
        err = EGL_BAD_ACCESS,
        baseOperation = eglMakeCurrent,
        cArguments = (
                <OpenGL._opaque.EGLDisplay_pointer object at 0x7f4abff59fc0>,
                <OpenGL._opaque.EGLSurface_pointer object at 0x7f4abff58440>,
                <OpenGL._opaque.EGLSurface_pointer object at 0x7f4abff58440>,
                <OpenGL._opaque.EGLContext_pointer object at 0x7f4a294f2040>,
        ),
        result = 0
)

I have tried different render modes in eval_env = gym.make(FLAGS.env) and exported MUJOCO_GL=egl, which are all useless.

parzivar commented 1 week ago

i have met the same problem too `(serl) robo@robot:~/work/serl$ python franka_sim/franka_sim/test/test_gym_env_human.py /home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound precision lowered by casting to float32 logger.warn(f"Box bound precision lowered by casting to {self.dtype}") Traceback (most recent call last): File "/home/robo/work/serl/franka_sim/franka_sim/test/test_gym_env_human.py", line 9, in env = envs.PandaPickCubeGymEnv(action_scale=(0.1, 1)) File "/home/robo/work/serl/franka_sim/franka_sim/envs/panda_pick_gym_env.py", line 148, in init self._viewer.render(self.render_mode) File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 706, in render viewer = self._get_viewer(render_mode=render_mode) File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 730, in _get_viewer self.viewer = OffScreenViewer( File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 155, in init super().init(model, data, width, height, max_geom, visual_options) File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 53, in init self.viewport = mujoco.MjrRect(0, 0, width, height) TypeError: init(): incompatible constructor arguments. The following argument types are supported:

  1. mujoco._render.MjrRect(left: int, bottom: int, width: int, height: int)

Invoked with: 0, 0, None, None Exception ignored in: <function OffScreenViewer.del at 0x7f4e525a9ea0> Traceback (most recent call last): File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 202, in del self.free() File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 199, in free self.opengl_context.free() File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/mujoco/egl/init.py", line 125, in free EGL.eglDestroyContext(EGL_DISPLAY, self._context) File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/OpenGL/platform/baseplatform.py", line 415, in call return self( *args, **named ) File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError OpenGL.raw.EGL._errors.EGLError: EGLError( err = EGL_NOT_INITIALIZED, baseOperation = eglDestroyContext, cArguments = ( <OpenGL._opaque.EGLDisplay_pointer object at 0x7f4e8def24c0>, <OpenGL._opaque.EGLContext_pointer object at 0x7f4e526f6a40>, ), result = 0 ) Exception ignored in: <function GLContext.del at 0x7f4e8dba9360> Traceback (most recent call last): File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/mujoco/egl/init.py", line 130, in del self.free() File "/home/robo/miniconda3/envs/serl/lib/python3.10/site-packages/mujoco/egl/init.py", line 125, in free EGL.eglDestroyContext(EGL_DISPLAY, self._context) File "src/errorchecker.pyx", line 58, in OpenGL_accelerate.errorchecker._ErrorChecker.glCheckError OpenGL.raw.EGL._errors.EGLError: EGLError( err = EGL_NOT_INITIALIZED, baseOperation = eglDestroyContext, cArguments = ( <OpenGL._opaque.EGLDisplay_pointer object at 0x7f4e8def24c0>, <OpenGL._opaque.EGLContext_pointer object at 0x7f4e526f6a40>, ), result = 0 ) `