Open houyaokun opened 1 year ago
Hello, I encountered the same err, could you please share how you solve this?
Hello, I encountered the same err, could you please share how you solve this?
I just render in headless mode, then the error disappeared.
Hi, Sorry for the naive question, but how do you run in headless mode? I have tried setting cfg.show_gui as false in datarenderer.py. I have also unset $DISPLAY. However, I continue to face this error.
Thanks!
@qureshinomaan what exactly are you running? The datarenderer in calvin_env is not being used during training, we just used it to render the dataset once after recording it with teleoperation. During training, it's the rollout callbacks that use the calvin_env simulator, so as a quick fix you could disable them during training (they are just used to evaluate the performance during training), however, you would still need to render at one point for the full evaluation after the training is done. (for disabling them, set ~callbacks/rollout and ~callbacks/rollout_lh in the command line arguments for the training). Also, headless rendering is enabled by default for the rollouts / the evaluation. Does your computer have a graphics card? EGL renders on the GPU, so it would fail if you don't have one.
What is the output if you run this script in calvin_env?
Hi @lukashermann! Thanks a lot for responding! I am using the following command with debug dataset
$ python training.py datamodule.root_data_dir=/path/to/dataset/ datamodule/datasets=vision_lang_shm
I am running this on a machine with a 3080Ti GPU with 16GB VRAM. In the environment, torch is properly installed (cuda.is_available() is true)
I get the following output :
| Name | Type | Params
--------------------------------------------------------------
0 | perceptual_encoder | ConcatEncoders | 174 K
1 | plan_proposal | PlanProposalNetwork | 13.9 M
2 | plan_recognition | PlanRecognitionNetwork | 36.0 M
3 | visual_goal | VisualGoalEncoder | 4.4 M
4 | language_goal | LanguageGoalEncoder | 5.1 M
5 | action_decoder | LogisticPolicyNetwork | 13.8 M
--------------------------------------------------------------
73.2 M Trainable params
0 Non-trainable params
73.2 M Total params
146.424 Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]pybullet build time: Nov 28 2023 23:51:11
[2024-01-07 14:26:30,850][calvin_agent.wrappers.calvin_env_wrapper][WARNING] - Couldn't find correct EGL device. Setting EGL_VISIBLE_DEVICE=0. When using DDP with many GPUs this can lead to OOM errors. Did you install PyBullet correctly? Please refer to calvin env README
[2024-01-07 14:26:30,851][calvin_agent.wrappers.calvin_env_wrapper][INFO] - EGL_DEVICE_ID 0 <==> CUDA_DEVICE_ID 0
argv[0]=--width=200
argv[1]=--height=200
[2024-01-07 14:26:30,958][calvin_env.envs.play_table_env][INFO] - Loading EGL plugin (may segfault on misconfigured systems)...
failed to EGL with glad.
EGL has nothing to do with torch, it is the GPU renderer of pybullet
.
Could you still run the script that I linked and copy the output here?
cd calvin_env/egl_check
bash build.sh # should have been built automatically, but try running this again
python list_egl_options.py
Anyway, this is not an issue with our repository, but with pybullet
. Did you try following issues like this one?
The output of the commands you said to run.
----------Default-------------
Starting EGL query
b'EGL device choice: -1 of 0.\neglInitialize() failed with error: 3008\n'
number of EGL devices: 0
I think there is a mismatch between the egl driver and cuda driver in my system. I have seen similar issues on Habitat and ai2thor repositories as well. I setted up the repository on another system, followed the same instructions and was able to run it. The only difference was the version of cuda (worked with cuda 12.0, didn't work with 11.7).
Anyways, thanks for your help!
I don't think that the cuda driver is relevant here, maybe the nvidia driver.
Hello, I encountered the same err, could you please share how you solve this?
I just render in headless mode, then the error disappeared.
Sorry but can you teach me how to render in headless mode?
It renders in headless mode by default. which error do you get?
Hello:
The same error failed to EGL with glad
, though show_gui=False.
The output of the commands
cd calvin_env/egl_check bash build.sh # should have been built automatically, but try running this again python list_egl_options.py
is:
----------Default------------- Starting EGL query Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query b'EGL device choice: -1 of 1.\n' number of EGL devices: 1 ----------Option #1 (id=0)------------- Starting EGL query EGL device choice: 0 of 1 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query
The output of the commands
ldconfig -p | grep libEGL
is
libEGL_mesa.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so.1 libEGL.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so
nvidia driver:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.105.17 Tue Mar 28 18:02:59 UTC 2023 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)
Are you sure the nvidia-drivers are correctly installed? What's your output for nvidia-smi
? The output of list_egl_options.py
should list the Nvidia card.
Hi,Sorry but can you help me with the same bug? I try to run the command
python evaluation/evaluate_policy.py --dataset_path $CALVIN_ROOT/dataset/calvin_debug_dataset --train_folder $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline --checkpoint $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline/mcil_baseline.ckpt
and got the same bug
pybullet build time: May 10 2024 10:39:45 Global seed set to 0 trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/training/lang_annotations/auto_lang_ann.npy trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/validation/lang_annotations/auto_lang_ann.npy argv[0]=--width=200 argv[1]=--height=200 failed to EGL with glad.
I also try the list_egl_options.py and get `----------Default------------- Starting EGL query Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query b'EGL device choice: -1 of 9.\n' number of EGL devices: 9 ----------Option #1 (id=0)------------- Starting EGL query EGL device choice: 0 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=0 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query
----------Option #2 (id=1)------------- Starting EGL query EGL device choice: 1 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=1 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query
----------Option #3 (id=2)------------- Starting EGL query EGL device choice: 2 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=2 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query
----------Option #4 (id=3)------------- Starting EGL query EGL device choice: 3 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=3 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query
----------Option #5 (id=4)------------- Starting EGL query EGL device choice: 4 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD131: Permission denied
libEGL warning: failed to open /dev/dri/renderD131: Permission denied
eglInitialize() failed with error: 3008
----------Option #6 (id=5)------------- Starting EGL query EGL device choice: 5 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD130: Permission denied
libEGL warning: failed to open /dev/dri/renderD130: Permission denied
eglInitialize() failed with error: 3008
----------Option #7 (id=6)------------- Starting EGL query EGL device choice: 6 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD129: Permission denied
libEGL warning: failed to open /dev/dri/renderD129: Permission denied
eglInitialize() failed with error: 3008
----------Option #8 (id=7)------------- Starting EGL query EGL device choice: 7 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD128: Permission denied
libEGL warning: failed to open /dev/dri/renderD128: Permission denied
eglInitialize() failed with error: 3008
----------Option #9 (id=8)------------- Starting EGL query EGL device choice: 8 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query`
Hope for your reply.
Hello, I encountered the same err, could you please share how you solve this?
me too!
error:failed to EGL with glad. Does this error occur because I didn't install EGL properly? When I enter "ldconfig -p | grep libEGL" in the terminal, I get the following output. libEGL_nvidia.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_nvidia.so.0 libEGL_nvidia.so.0 (libc6) => /lib/i386-linux-gnu/libEGL_nvidia.so.0 libEGL_mesa.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so.1 libEGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so Can you please guide me on what to do next? Thank you very much.