some issues about verify_image_on_cuda.py

feriorior commented 1 year ago

Thank you for this wonderful work!!!! Unfortunately, when running this program (verify_image_on_cuda.py) with a screenless remote server (Ubuntu 18.04), I got the some errors. Further, I define the env = MetaDriveEnv(dict(environment_num=1000, start_seed=1010, image_on_cuda=True, traffic_density=0.05,)), I get the similar errors. I want to know how to use cuda_version Env in a remote server without screens.

"Known pipe types: glxGraphicsPipe (1 aux display modules not yet loaded.) :display:x11display(error): Could not open display ":0.0". :display(error): The 'textures_power_2' configuration is set to 'none', meaning that non-power-of-two texture support is required, but the video driver I'm trying to use does not support non-power-of-two textures. :device(warning): /dev/input/event2 is not readable, some features will be unavailable. Traceback (most recent call last): File "verify_image_on_cuda.py", line 57, in _test_rgb_camera_as_obs(args.render, image_on_cuda=not args.native) File "verify_image_on_cuda.py", line 28, in _test_rgb_camera_as_obs env.reset() File "/home/vehicle/meta/metadrive/envs/base_env.py", line 372, in reset self.lazy_init() # it only works the first time when reset() is called to avoid the error when render File "/home/vehicle/meta/metadrive/envs/base_env.py", line 259, in lazy_init engine = initialize_engine(self.config) File "/home/vehicle/meta/metadrive/engine/engine_utils.py", line 12, in initialize_engine cls.singleton = cls(env_global_config) File "/home/vehicle/meta/metadrive/engine/base_engine.py", line 29, in init EngineCore.init(self, global_config) File "/home/siao/vehicle/meta/metadrive/engine/core/engine_core.py", line 243, in init use_occlusion_maps=False File "/home/vehicle/meta/metadrive/engine/core/our_pbr.py", line 51, in init use_occlusion_maps=use_occlusion_maps File "/home/miniconda3/envs/md/lib/python3.7/site-packages/simplepbr/init.py", line 136, in init self._setup_tonemapping() File "/home/miniconda3/envs/md/lib/python3.7/site-packages/simplepbr/init.py", line 264, in _setup_tonemapping self.tonemap_quad.set_shader(tonemap_shader) AttributeError: 'NoneType' object has no attribute 'set_shader'"

QuanyiLi commented 1 year ago

Hi,

:display:x11display(error): Could not open display ":0.0".

This error is caused by Panda3D graphics pipeline which doesn't support headless rendering in the official distribution. You have to compile panda3d on your headless machine and install it in your conda env. For more details, see: https://metadrive-simulator.readthedocs.io/en/latest/install.html#install-metadrive-with-headless-rendering Besides, sudo is required in this case as some libs might be required for compiling.

:display(error): The 'textures_power_2' configuration is set to 'none'

Generally, textures on the graphics card will be scaled up/down to power-2 size, like 256, 512, etc. In this case, you are not allowed to set random image size as your camera observation, but only power-2 size image observation can be returned correctly. To remove this limit, we set loadPrcFileData("", "textures-power-2 none") in class EngineCore. For your problem, you can simply remove this line and use a power_2 size image as the observation, like setting rgb_camera=(512,512) or window_size=(128, 128). Then everything will go well and this error will be suppressed. I will consider adding this option to env_config so we can turn it off/on it quickly.

However, we have to say that we didn't test the headless pipeline for a long time, and thus you may encounter other problems beyond the documentation. Let's stay in touch. Besides, could you provide more information about your platform like your os, GPU, driver, and CUDA? Running nvidia-smi is enough I think. If we have machines in similar condition, we can, probably, help you figure out the installation.

feriorior commented 1 year ago

Thanks for your reply. I encounter jpeg-relevant bugs. But I am not sure where is these path "–jpeg-incdir /path/to/your/jpeg/include and –jpeg-libdir /path/to/your/jpeg/lib.". More details about nvidia-smi information as follows:

" NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 "

QuanyiLi commented 1 year ago

According to our doc, you don't need to specify the jpeg path. All you need is to specify your python path in the following command:

python ./makepanda/makepanda.py --everything --no-x11 --no-opencv --no-fmodex --use-egl --no-gtk3\
  --python-incdir /path/to/your/conda_env/include/ \
  --python-libdir /path/to/your/conda_env/lib/ \
  --thread 8 --wheel

QuanyiLi commented 1 year ago

Well, I think recompiling panda is not required now. The error is raised due to Graphics.renderIntoTexture(). I tested the headless mode on a server with 1080ti, and got a rendered image as follows after skipping the broken function. main_1678886366 1613483

I will fix the headless mode issue soon

QuanyiLi commented 1 year ago

Also, I found several rendering-related problems, see #336. I will fix them ASAP

feriorior commented 1 year ago

Thanks for your help. I recompile the panda and a new error. I'm sure the same problems with you. I put my results as follows:

"Successfully registered the following environments: ['MetaDrive-validation-v0', 'MetaDrive-10env-v0', 'MetaDrive-100envs-v0', 'MetaDrive-1000envs-v0', 'SafeMetaDrive-validation-v0', 'SafeMetaDrive-10env-v0', 'SafeMetaDrive-100envs-v0', 'SafeMetaDrive-1000envs-v0', 'MARLTollgate-v0', 'MARLBottleneck-v0', 'MARLRoundabout-v0', 'MARLIntersection-v0', 'MARLParkingLot-v0', 'MARLMetaDrive-v0']. :device(warning): /dev/input/event2 is not readable, some features will be unavailable. WARNING:root:You may using too large buffer! The height is 256, and width is 256. It may lower the sample efficiency! Considering reduce buffer size or using cuda image by set [image_on_cuda=True]. Bullet physics world is launched successfully! Known pipe types: eglGraphicsPipe (all display modules loaded.) :display(error): Could not get requested FrameBufferProperties; abandoning window. requested: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 multisamples=16 force_hardware got: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 stencil_bits=8 multisamples=8 force_hardware Error happens when drawing scene in offscreen mode!"

QuanyiLi commented 1 year ago

Hi,

I create a new PR #337, which can successfully run on my headless machine without compiling Panda3D and any further actions. Could you help me test it?

Just switch to fix-offscreen-rendering branch and reinstall all dependencies including panda3d via pip install -e . and run following command:

python -m metadrive.examples.verify_headless_installation

Note: donot use the compiled panda3d, the officially distributed one is fine. I already tested it.

The script will generate Three pairs of images to examples directory, one from agent observation, the other from panda3d internal rendering buffer. Please fetch and check those images from the cluster or server to ensure MetaDrive can draw scenes and capture images correctly.

feriorior commented 1 year ago

Thanks a lot for your quick update. This branch can obtain complete visual images & depth images. I noticed that this testing file uses some constant hyper-parameters to test this feature, and it may be unable to generate visual observations with CUDA right now. Looking forward to your update.

Besides, the RLLib is still not friendly to new comers. Turning into Stable Baseline 3 (SB3) might be more pythonic solution. Hope the anthors give more training examples.

pengzhenghao commented 1 year ago

Good idea on sb3 example. We actually have that code. Thanks for bringing it up.Best,ZhenghaoOn Mar 15, 2023, at 21:18, siaoliu @.***> wrote: Thanks a lot for your quick update. This branch can obtain complete visual images & depth images. I noticed that this testing file uses some constant hyper-parameters to test this feature, and it may be unable to generate visual observations with CUDA right now. Looking forward to your update. Besides, the RLLib is still not friendly to new comers. Turning into Stable Baseline 3 (SB3) might be more pythonic solution. Hope the anthors give more training examples.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

QuanyiLi commented 1 year ago

@siaoliu I updated the script, now you can use python -m metadrive.examples.verify_headless_installation --cuda --camera ["main"/"rgb"/"depth"] to test each camera with or without the cuda. I didn't follow previous ways that testing all cameras together. In this way, we can avoid a subtle problem.

Besides, I noticed that your machine's cuda is < 12.0 which may not support you to use the cuda pipeline. Consider updating it and enjoy!

QuanyiLi commented 1 year ago

I just merged this PR to main. You can pull the latest main for the test!

feriorior commented 1 year ago

Thanks, I would update my CUDA version soon. I'm not sure the CUDA12 is whether be incompatible with the torch version (I remember torch may run with a inherent toolkit).

pengzhenghao commented 1 year ago

cuda 12 is compatible with torch even you install cudatoolkit=11.7 following torch installation guide. Best,ZhenghaoOn Mar 17, 2023, at 21:56, siaoliu @.***> wrote: Thanks, I would update my CUDA version soon. I'm not sure the CUDA12 is whether be incompatible with the torch version (I remember torch may run with a inherent toolkit).

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

feriorior commented 1 year ago

Perhaps off-heading envs still invokes the display device? I am not sure why this happens : Failed to destroy EGL context: EGL_BAD_DISPLAY.

"Bullet physics world is launched successfully! Known pipe types: glxGraphicsPipe (1 aux display modules not yet loaded.) :display:x11display(error): Could not open display ":0.0". The observation is a dict with numpy arrays as values: {'image': (512, 512, 3, 3), 'state': (19,)} rgb_camera Test result: Headless mode Offscreen render launched successfully! images named 'rgb_camera_from_observation.png' and 'rgb_camera_from_buffer.png' are saved to /home/vehicle/metadrive/metadrive/examples. Open it to check if offscreen mode works well :display:egldisplay(error): Failed to destroy EGL context: EGL_BAD_DISPLAY :display:egldisplay(error): Failed to terminate EGL display: EGL_BAD_DISPLAY"

QuanyiLi commented 1 year ago

If the rendered image is ok, just ignore this error. It is something raised when closing the environment and the game engine, so won't affect the application. And yes, the offscreen rendering still looks for the display device, while stopping sending rendered image to the screen if can not find one. This is what we called the headless situation. In this case, our game engine can follow the original OpenGL to render the same content as there is a screen. Besides, other rendering pipeline like EGL might be launched (without using), and this error is raised by EGL. EGL is a solution for rendering without X-server on a headless machine.

As for the cuda version. 12.0 is for the cuda runtime, which is the cuda in your system and can be checked by nvidia-smi, while the cuda for torch usually stands for cuda toolkit. You can always have different versions for both, and a lower version toolkit can always be used. Simple conda install cudatoolkit==11.3 or whatever is fine or personally, I like pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

feriorior commented 1 year ago

@pengzhenghao Can you provide a simple training example (like SAC or PPO (sb3)) as a baseline on basic MetaDriveEnv. Maybe due to some error hyper-parameters, it is hard for me to reproduce the accuracy obtained in the paper. Looking forward your help !!! Besides, I think the observations can add a RGB-D as a basic visual format, which is useful in the realistic scenarios.

leejiahe commented 1 year ago

Hi, Similar to siaoliu, I had faced the same issue. I am using the new version of metadrive of MetaDrive-0.3.0.1 NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4

I am running the software on 4x NVIDIA Tesla V100 32GB GPU. Hence, the difference may be I must run the software headless.

I had also followed the steps you mentioned to make panda. However, the issue still persists.

According to our doc, you don't need to specify the jpeg path. All you need is to specify your python path in the following command:
python ./makepanda/makepanda.py --everything --no-x11 --no-opencv --no-fmodex --use-egl --no-gtk3\
  --python-incdir /path/to/your/conda_env/include/ \
  --python-libdir /path/to/your/conda_env/lib/ \
  --thread 8 --wheel

Can I ask if there is any remedy or precedent to run Metadrive headless on V100 servers?

QuanyiLi commented 1 year ago

Hi @leejiahe

I think no special treatment is required now. I remember that the doc's installation section has already been updated. It says now only a line of pip install -e . can make all things work, including headless running. Could you follow the instructions in doc and give us feedback? I tested headless mode on Nvidia-1080/A6000 and I think V100 should work as well.

Quanyi

leejiahe commented 1 year ago

Dear @QuanyiLi ,

Thank you for your fast reply.

I had created a new conda environment and followed the installation instructions which contains the three lines of code (that includes pip install -e . as you mentioned). However, it still doesn't work, my error message is different from siaoliu's, below is the error message when I typed python -m metadrive.examples.verify_headless_installation

Known pipe types: glxGraphicsPipe (1 aux display modules not yet loaded.) :display:x11display(error): Could not open display ":0.0". :display:egldisplay(warning): Couldn't initialize the default EGL display: EGL_NOT_INITIALIZED :display(warning): FrameBufferProperties available less than requested. requested: depth_bits=1 color_bits=3 red_bits=1 green_bits=1 blue_bits=1 alpha_bits=1 multisamples=8 back_buffers=1 force_hardware got: depth_bits=32 color_bits=24 red_bits=8 green_bits=8 blue_bits=8 alpha_bits=8 back_buffers=1 force_hardware :display(error): Could not get requested FrameBufferProperties; abandoning window. requested: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 multisamples=16 force_hardware got: depth_bits=24 float_color color_bits=48 red_bits=16 green_bits=16 blue_bits=16 alpha_bits=16 stencil_bits=8 multisamples=1 force_hardware Traceback (most recent call last): File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/stevenlee/metadriving/metadrive/metadrive/examples/verify_headless_installation.py", line 11, in verify_installation(args.cuda, args.camera) File "/home/stevenlee/metadriving/metadrive/metadrive/tests/test_installation.py", line 80, in verify_installation capture_headless_image(cuda) File "/home/stevenlee/metadriving/metadrive/metadrive/tests/test_installation.py", line 29, in capture_headless_image env.reset() File "/home/stevenlee/metadriving/metadrive/metadrive/envs/base_env.py", line 375, in reset self.lazy_init() # it only works the first time when reset() is called to avoid the error when render File "/home/stevenlee/metadriving/metadrive/metadrive/envs/base_env.py", line 262, in lazy_init engine = initialize_engine(self.config) File "/home/stevenlee/metadriving/metadrive/metadrive/engine/engine_utils.py", line 12, in initialize_engine cls.singleton = cls(env_global_config) File "/home/stevenlee/metadriving/metadrive/metadrive/engine/base_engine.py", line 29, in init EngineCore.init(self, global_config) File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/engine_core.py", line 251, in init use_occlusion_maps=False File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/our_pbr.py", line 51, in init use_occlusion_maps=use_occlusion_maps File "/home/stevenlee/miniconda3/envs/driving/lib/python3.7/site-packages/simplepbr/init.py", line 136, in init self._setup_tonemapping() File "/home/stevenlee/metadriving/metadrive/metadrive/engine/core/our_pbr.py", line 87, in _setup_tonemapping self.tonemap_quad.set_shader(tonemap_shader) AttributeError: 'NoneType' object has no attribute 'set_shader'

Thank you so much for your help

QuanyiLi commented 1 year ago

It must be caused by interactions between OpenGL and GPU. I tried it again on a new A5000 GPU cluster. Everything works fine. My output message after running the same script is:

Known pipe types:
  glxGraphicsPipe
(1 aux display modules not yet loaded.)
:display:x11display(error): Could not open display ":0.0".
WARNING:root:You may using too large buffer! The height is 512, and width is 512. It may lower the sample efficiency! Considering reduce buffer size or using cuda image by set [image_on_cuda=True].
main_camera Test result:
Headless mode Offscreen render launched successfully!
images named 'main_camera_from_observation.png' and 'main_camera_from_buffer.png' are saved to /home/quanyi/metadrive/metadrive/examples. Open it to check if offscreen mode works well
Aborted (core dumped)

I guess it is due to some wrong settings of FrameBufferProperties, but I can not help more on this. I created an issue for this in the Panda3D forum: https://discourse.panda3d.org/t/got-nothing-returned-when-calling-rendersceneinto-in-headless-mode/29253

Let's just wait for their feedback!

QuanyiLi commented 1 year ago

Hi, Jiahe @leejiahe

According to the reply here: https://discourse.panda3d.org/t/got-nothing-returned-when-calling-rendersceneinto-in-headless-mode/29253, one possible reason is that virtual framebuffer can't be created on your Linux. Could you try: sudo apt-get install xvfb xserver-xephyr -y to allow creating the virtual frame buffer?

Besides, have you tried other offscreen rendering simulators on your machine? Especially, those rendering scenes via OpenGL. They will also fail due to the lack of this function.

Quanyi

leejiahe commented 1 year ago

Dear @QuanyiLi ,

Thank you for your help.

I had tried installing the libraries, as you suggested, but the problem still persists. It still does not create a visual frame buffer.

Actually, I had tried installing DonkeyCar and AirSim, but I also faced the same issue, where I can't render virtually.

I chanced upon this just now, is there something equivalent for MetaDrive? CARLA off-screen GPU Selection

QuanyiLi commented 1 year ago

@leejiahe Thanks for sharing. But I don't believe that Panda3D supports SDL, so the GPU selection has nothing to do with your problem.

I wish I could help more, but sorry about it. Maybe you could try looking through some topics with the keywords: headless x-11 server/OpenGL headless to see if you can get some hints. Besides, I guess some remote desktop services can not be launched on your machine due to the same problem. Thus related topics are worth reading.

leejiahe commented 1 year ago

Dear @QuanyiLi ,

Perfectly understandable. Thank you for your help!

AHPUymhd commented 1 year ago

what this issue?

QuanyiLi commented 1 year ago

@AHPUymhd Try pip install panda3d-gltf==0.13

metadriverse / metadrive

some issues about verify_image_on_cuda.py #334