mees / calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
http://calvin.cs.uni-freiburg.de
MIT License
366 stars 55 forks source link

failed to EGL with glad. #59

Open houyaokun opened 10 months ago

houyaokun commented 10 months ago

error:failed to EGL with glad. Does this error occur because I didn't install EGL properly? When I enter "ldconfig -p | grep libEGL" in the terminal, I get the following output. libEGL_nvidia.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_nvidia.so.0 libEGL_nvidia.so.0 (libc6) => /lib/i386-linux-gnu/libEGL_nvidia.so.0 libEGL_mesa.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so.1 libEGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so Can you please guide me on what to do next? Thank you very much.

2488583886 commented 10 months ago

Hello, I encountered the same err, could you please share how you solve this?

houyaokun commented 10 months ago

Hello, I encountered the same err, could you please share how you solve this?

I just render in headless mode, then the error disappeared.

qureshinomaan commented 8 months ago

Hi, Sorry for the naive question, but how do you run in headless mode? I have tried setting cfg.show_gui as false in datarenderer.py. I have also unset $DISPLAY. However, I continue to face this error.

Thanks!

lukashermann commented 8 months ago

@qureshinomaan what exactly are you running? The datarenderer in calvin_env is not being used during training, we just used it to render the dataset once after recording it with teleoperation. During training, it's the rollout callbacks that use the calvin_env simulator, so as a quick fix you could disable them during training (they are just used to evaluate the performance during training), however, you would still need to render at one point for the full evaluation after the training is done. (for disabling them, set ~callbacks/rollout and ~callbacks/rollout_lh in the command line arguments for the training). Also, headless rendering is enabled by default for the rollouts / the evaluation. Does your computer have a graphics card? EGL renders on the GPU, so it would fail if you don't have one.

What is the output if you run this script in calvin_env?

qureshinomaan commented 8 months ago

Hi @lukashermann! Thanks a lot for responding! I am using the following command with debug dataset

$ python training.py datamodule.root_data_dir=/path/to/dataset/ datamodule/datasets=vision_lang_shm

I am running this on a machine with a 3080Ti GPU with 16GB VRAM. In the environment, torch is properly installed (cuda.is_available() is true)

I get the following output :

 | Name               | Type                   | Params
--------------------------------------------------------------
0 | perceptual_encoder | ConcatEncoders         | 174 K 
1 | plan_proposal      | PlanProposalNetwork    | 13.9 M
2 | plan_recognition   | PlanRecognitionNetwork | 36.0 M
3 | visual_goal        | VisualGoalEncoder      | 4.4 M 
4 | language_goal      | LanguageGoalEncoder    | 5.1 M 
5 | action_decoder     | LogisticPolicyNetwork  | 13.8 M
--------------------------------------------------------------
73.2 M    Trainable params
0         Non-trainable params
73.2 M    Total params
146.424   Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]pybullet build time: Nov 28 2023 23:51:11
[2024-01-07 14:26:30,850][calvin_agent.wrappers.calvin_env_wrapper][WARNING] - Couldn't find correct EGL device. Setting EGL_VISIBLE_DEVICE=0. When using DDP with many GPUs this can lead to OOM errors. Did you install PyBullet correctly? Please refer to calvin env README
[2024-01-07 14:26:30,851][calvin_agent.wrappers.calvin_env_wrapper][INFO] - EGL_DEVICE_ID 0 <==> CUDA_DEVICE_ID 0
argv[0]=--width=200
argv[1]=--height=200
[2024-01-07 14:26:30,958][calvin_env.envs.play_table_env][INFO] - Loading EGL plugin (may segfault on misconfigured systems)...
failed to EGL with glad.
lukashermann commented 8 months ago

EGL has nothing to do with torch, it is the GPU renderer of pybullet. Could you still run the script that I linked and copy the output here?

cd calvin_env/egl_check
bash build.sh  # should have been built automatically, but try running this again
python list_egl_options.py
lukashermann commented 8 months ago

Anyway, this is not an issue with our repository, but with pybullet. Did you try following issues like this one?

https://github.com/bulletphysics/bullet3/discussions/3737

qureshinomaan commented 8 months ago

The output of the commands you said to run.

----------Default-------------
Starting EGL query
b'EGL device choice: -1 of 0.\neglInitialize() failed with error: 3008\n'
number of EGL devices: 0

I think there is a mismatch between the egl driver and cuda driver in my system. I have seen similar issues on Habitat and ai2thor repositories as well. I setted up the repository on another system, followed the same instructions and was able to run it. The only difference was the version of cuda (worked with cuda 12.0, didn't work with 11.7).

Anyways, thanks for your help!

lukashermann commented 8 months ago

I don't think that the cuda driver is relevant here, maybe the nvidia driver.

Patricia1019 commented 6 months ago

Hello, I encountered the same err, could you please share how you solve this?

I just render in headless mode, then the error disappeared.

Sorry but can you teach me how to render in headless mode?

lukashermann commented 6 months ago

It renders in headless mode by default. which error do you get?

COST-97 commented 5 months ago

Hello: The same error failed to EGL with glad, though show_gui=False. The output of the commands cd calvin_env/egl_check bash build.sh # should have been built automatically, but try running this again python list_egl_options.py is: ----------Default------------- Starting EGL query Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query b'EGL device choice: -1 of 1.\n' number of EGL devices: 1 ----------Option #1 (id=0)------------- Starting EGL query EGL device choice: 0 of 1 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query

The output of the commands ldconfig -p | grep libEGL is libEGL_mesa.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so.1 libEGL.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so

nvidia driver: NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.105.17 Tue Mar 28 18:02:59 UTC 2023 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

lukashermann commented 5 months ago

Are you sure the nvidia-drivers are correctly installed? What's your output for nvidia-smi ? The output of list_egl_options.py should list the Nvidia card.

Caixy1113 commented 4 months ago

Hi,Sorry but can you help me with the same bug? I try to run the command python evaluation/evaluate_policy.py --dataset_path $CALVIN_ROOT/dataset/calvin_debug_dataset --train_folder $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline --checkpoint $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline/mcil_baseline.ckpt and got the same bug pybullet build time: May 10 2024 10:39:45 Global seed set to 0 trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/training/lang_annotations/auto_lang_ann.npy trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/validation/lang_annotations/auto_lang_ann.npy argv[0]=--width=200 argv[1]=--height=200 failed to EGL with glad.

I also try the list_egl_options.py and get `----------Default------------- Starting EGL query Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query b'EGL device choice: -1 of 9.\n' number of EGL devices: 9 ----------Option #1 (id=0)------------- Starting EGL query EGL device choice: 0 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=0 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query

----------Option #2 (id=1)------------- Starting EGL query EGL device choice: 1 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=1 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query

----------Option #3 (id=2)------------- Starting EGL query EGL device choice: 2 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=2 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query

----------Option #4 (id=3)------------- Starting EGL query EGL device choice: 3 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=NVIDIA Corporation CUDA_DEVICE=3 GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2 GL_VERSION=3.3.0 NVIDIA 545.23.06 GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler Completeing EGL query

----------Option #5 (id=4)------------- Starting EGL query EGL device choice: 4 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD131: Permission denied

libEGL warning: failed to open /dev/dri/renderD131: Permission denied

eglInitialize() failed with error: 3008

----------Option #6 (id=5)------------- Starting EGL query EGL device choice: 5 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD130: Permission denied

libEGL warning: failed to open /dev/dri/renderD130: Permission denied

eglInitialize() failed with error: 3008

----------Option #7 (id=6)------------- Starting EGL query EGL device choice: 6 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD129: Permission denied

libEGL warning: failed to open /dev/dri/renderD129: Permission denied

eglInitialize() failed with error: 3008

----------Option #8 (id=7)------------- Starting EGL query EGL device choice: 7 of 9 (from EGL_VISIBLE_DEVICE) libEGL warning: failed to open /dev/dri/renderD128: Permission denied

libEGL warning: failed to open /dev/dri/renderD128: Permission denied

eglInitialize() failed with error: 3008

----------Option #9 (id=8)------------- Starting EGL query EGL device choice: 8 of 9 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query`

Hope for your reply.

xiaofeifei-1 commented 2 months ago

Hello, I encountered the same err, could you please share how you solve this?

me too!