openai / mujoco-py

MuJoCo is a physics engine for detailed, efficient rigid body simulations with contacts. mujoco-py allows using MuJoCo from Python 3.
Other
2.79k stars 810 forks source link

Mujoco_py 2.1, always rebuilding in cluster #763

Closed im-Kitsch closed 8 months ago

im-Kitsch commented 1 year ago

Hi,

I try to import mujoco-py. But it always rebuild when I submit task via slurm. It always show "mujoco_py/cymj.pyx because it changed" and has error "cannot find -lGL: No such file or directory".

I installed mujoco_py using the method mentioned in Issue 627 https://github.com/openai/mujoco-py/issues/627. I can run it in login-node without any problem. But it doesn't anymore when I submitted job via slurm.

I installed dependecy by

conda install -c conda-forge glew
conda install -c conda-forge mesalib
conda install -c menpo glfw3

I can run mujoco-py 2.0 version in cluster without any problem. But mujoco_py 2.1 always rebuilds "cymj.pyx" when I import mujoco_py. I think mujoco-py wrongly checked the system environment. It may relates to conda or library link. But anyway, the rebuilding should be prevented. Is there any method so that I can prevent this behavior?

Thanks a lot in advance

To Reproduce

#!/bin/bash

#SBATCH -J test_job_16

#SBATCH -e /tmp/temp_test_mujoco/test_%x.%j.err
#SBATCH -o /tmp/temp_test_mujoco/test_%x.%j.out

conda activate tri7
python -c "import mujoco_py; print(mujoco_py.__version__)"

Then

sbatch job.sh

Error Messages

/home/user_id/miniconda3/envs/tri7/compiler_compat/ld: cannot find -lGL: No such file or directory
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 267, in link
    self.spawn(linker + ld_args)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 1007, in spawn
    spawn(cmd, dry_run=self.dry_run, **kwargs)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/usr/bin/gcc' failed with exit code 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/__init__.py", line 2, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 504, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 110, in load_cython_ext
    cext_so_path = builder.build()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 226, in build
    built_so_file_path = self._build_impl()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 278, in _build_impl
    so_file_path = super()._build_impl()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 249, in _build_impl
    dist.run_commands()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/mujoco_py-2.1.2.14-py3.9.egg/mujoco_py/builder.py", line 149, in build_extensions
    build_ext.build_extensions(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 573, in build_extension
    self.compiler.link_shared_object(
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/ccompiler.py", line 751, in link_shared_object
    self.link(
  File "/home/user_id/miniconda3/envs/tri7/lib/python3.9/site-packages/setuptools/_distutils/unixccompiler.py", line 269, in link
    raise LinkError(msg)
distutils.errors.LinkError: command '/usr/bin/gcc' failed with exit code 1
srun: error: mpsc0176: task 0: Exited with exit code 1

Desktop (please complete the following information):

interestingzhuo commented 1 year ago

+1

gabrieletiboni commented 1 year ago

+1

gabrieletiboni commented 1 year ago

@interestingzhuo @im-Kitsch have you guys figured something out? I'm stuck with the same problem

im-Kitsch commented 1 year ago

@interestingzhuo @im-Kitsch have you guys figured something out? I'm stuck with the same problem

No, unfortunately, I think this isssue is out of my ability.

saran-t commented 1 year ago

Do you have any specific need for the old MuJoCo version?

im-Kitsch commented 1 year ago

Do you have any specific need for the old MuJoCo version?

Yes, many classical implementations are still based on the old Mujoco and gym version. For example, stable baselines3, etc.

gabrieletiboni commented 1 year ago

@interestingzhuo @im-Kitsch I got it to work!

The mujoco_py README has a couple of lines regarding this cannot find -lGL error: image

As I couldn't symlink without sudo rights, I:

I was then able to build mujoco_py on the cluster nodes.

PS: I'm also stuck to this mujoco version as I'm using stable-baselines3.

CREDITS: Originally posted by @luckeciano in https://github.com/openai/mujoco-py/issues/627#issuecomment-1094396677

im-Kitsch commented 1 year ago

@interestingzhuo @im-Kitsch I got it to work!

The mujoco_py README has a couple of lines regarding this cannot find -lGL error: image

As I couldn't symlink without sudo rights, I:

  • located the libGL.so.1 file in my /usr lib dir (which in my cluster was in /usr/lib64)
  • copied it into my conda lib dir ($CONDA_PREFIX/lib in my case)
  • I created the symlink there: ln -s $CONDA_PREFIX/lib/libGL.so.1 $CONDA_PREFIX/lib/libGL.so

I was then able to build mujoco_py on the cluster nodes.

PS: I'm also stuck to this mujoco version as I'm using stable-baselines3.

CREDITS: Originally posted by @luckeciano in #627 (comment)

Hi, @gabrieletiboni , thanks for the hint. Unfortunately, I tried to copy libGL.so to $CONDA_PREFIX/lib, but it still doesn't work. I can build on cluster login-node, but submitting job doesn't work.

But congratualation to your node.

My output is as follows, if anyone has same issue:

gcc: fatal error: Killed signal terminated program cc1
compilation terminated.
Traceback (most recent call last):
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 186, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] + extra_postargs)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 1007, in spawn
    spawn(cmd, dry_run=self.dry_run, **kwargs)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/spawn.py", line 70, in spawn
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command '/usr/bin/gcc' failed with exit code 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zh50syxa/temp_test_mujoco/test_mujoco.py", line 1, in <module>
    import mujoco_py
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/__init__.py", line 2, in <module>
    from mujoco_py.builder import cymj, ignore_mujoco_warnings, functions, MujocoException
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 504, in <module>
    cymj = load_cython_ext(mujoco_path)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 110, in load_cython_ext
    cext_so_path = builder.build()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 226, in build
    built_so_file_path = self._build_impl()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 278, in _build_impl
    so_file_path = super()._build_impl()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 249, in _build_impl
    dist.run_commands()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/dist.py", line 1208, in run_command
    super().run_command(command)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/mujoco_py/builder.py", line 149, in build_extensions
    build_ext.build_extensions(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 468, in build_extensions
    self._build_extensions_serial()
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 494, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 549, in build_extension
    objects = self.compiler.compile(
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/ccompiler.py", line 599, in compile
    self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
  File "/home/zh50syxa/miniconda3/envs/test38/lib/python3.8/site-packages/setuptools/_distutils/unixccompiler.py", line 188, in _compile
    raise CompileError(msg)
distutils.errors.CompileError: command '/usr/bin/gcc' failed with exit code 1
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=41479462.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
im-Kitsch commented 8 months ago

finally, I think I solved this by using @gabrieletiboni 's method.