Fail to start LGSVL simulator inside docker container

weihunko commented 5 years ago

I have tried to run rviz and glxgears inside container and they worked fine, but I failed to start the LGSVL simulator inside docker. Here is the log message.

 1 Preloaded type GtkListStore                                                                                    
  2 Preloaded type GtkWindow
  3 Preloaded type GtkVBox
  4 Preloaded type GtkImage
  5 Preloaded type GtkNotebook
  6 Preloaded type GtkHBox
  7 Preloaded type GtkFrame
  8 Preloaded type GtkAlignment
  9 Preloaded type GtkTreeView
 10 Preloaded type GtkLabel
 11 Preloaded type GtkCheckButton
 12 Preloaded type GtkScrolledWindow
 13 Preloaded type GtkComboBox
 14 Desktop is 1920 x 1080 @ 60 Hz
 15 [Vulkan init] extensions: count=1
 16 [Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
 17 Vulkan error VK_ERROR_INCOMPATIBLE_DRIVER (-9) file: ./Runtime/GfxDevice/vulkan/VKContext.cpp, line: 333
 18 Vulkan error./Runtime/GfxDevice/vulkan/VKContext.cpp:333
 19 Vulkan detection: 0
 20 No supported renderers found, exiting
 21  
 22 (Filename: ./PlatformDependent/LinuxStandalone/main.cpp Line: 639)

and the nvidia-smi output is as below

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P8    N/A /  N/A |    598MiB /  4030MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

I can launch LGSVL simulator outside docker on my host machine. Here is the log message for running simulator outside docker.

Preloaded type GtkListStore
Preloaded type GtkWindow
Preloaded type GtkVBox
Preloaded type GtkImage
Preloaded type GtkNotebook
Preloaded type GtkHBox
Preloaded type GtkFrame
Preloaded type GtkAlignment
Preloaded type GtkTreeView
Preloaded type GtkLabel
Preloaded type GtkCheckButton
Preloaded type GtkScrolledWindow
Preloaded type GtkComboBox
Desktop is 1920 x 1080 @ 60 Hz
[Vulkan init] extensions: count=14
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=1
[Vulkan init] extensions: name=VK_KHX_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
Vulkan detection: 2
Initialize engine version: 2019.1.10f1 (f007ed779b7a)
GfxDevice: creating device client; threaded=1
[Vulkan init] extensions: count=14
[Vulkan init] extensions: name=VK_EXT_acquire_xlib_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_debug_report, enabled=0
[Vulkan init] extensions: name=VK_EXT_direct_mode_display, enabled=0
[Vulkan init] extensions: name=VK_EXT_display_surface_counter, enabled=0
[Vulkan init] extensions: name=VK_KHR_display, enabled=1
[Vulkan init] extensions: name=VK_KHR_get_physical_device_properties2, enabled=0
[Vulkan init] extensions: name=VK_KHR_get_surface_capabilities2, enabled=0
[Vulkan init] extensions: name=VK_KHR_surface, enabled=1
[Vulkan init] extensions: name=VK_KHR_xcb_surface, enabled=0
[Vulkan init] extensions: name=VK_KHR_xlib_surface, enabled=1
[Vulkan init] extensions: name=VK_KHX_device_group_creation, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_fence_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_capabilities, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_capabilities, enabled=0
[Vulkan init] extensions: count=33
[Vulkan init] extensions: name=VK_KHR_swapchain, enabled=1
[Vulkan init] extensions: name=VK_KHR_descriptor_update_template, enabled=0
[Vulkan init] extensions: name=VK_KHR_dedicated_allocation, enabled=1
[Vulkan init] extensions: name=VK_KHR_get_memory_requirements2, enabled=1
[Vulkan init] extensions: name=VK_KHR_maintenance1, enabled=1
[Vulkan init] extensions: name=VK_KHR_push_descriptor, enabled=0
[Vulkan init] extensions: name=VK_KHR_sampler_mirror_clamp_to_edge, enabled=1
[Vulkan init] extensions: name=VK_KHR_shader_draw_parameters, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_memory_fd, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_semaphore_fd, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_fence, enabled=0
[Vulkan init] extensions: name=VK_KHR_external_fence_fd, enabled=0
[Vulkan init] extensions: name=VK_KHX_device_group, enabled=0
[Vulkan init] extensions: name=VK_KHX_external_memory, enabled=0
[Vulkan init] extensions: name=VK_KHX_external_memory_fd, enabled=0
[Vulkan init] extensions: name=VK_KHX_external_semaphore, enabled=0
...

run nvidia-smi outside docker container

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   51C    P5    N/A /  N/A |    614MiB /  4030MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1324      G   /usr/lib/xorg/Xorg                            27MiB |
|    0     21600      G   /usr/lib/xorg/Xorg                           214MiB |
|    0     22232      G   compiz                                       126MiB |
|    0     22604      G   ...d, --gpu-preferences=KAAAAAAAAAAgAAAgAA    86MiB |
+-----------------------------------------------------------------------------+

martins-mozeiko commented 5 years ago

We have never tried running Simulator in docker. Do you have specific need to run inside docker? If not, I suggest you not to.

In case you need to run simulator inside docker, you'll need to figure out how to use vulkan from inside of docker. nvidia-docker does not support forwarding Vulkan from host. See https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#is-vulkan-supported

To make this work it will require installing nvidia drivers inside docker and allowing full access to nvidia device on host (not using nvidia-docker). Here's more information on this: https://stackoverflow.com/a/25367554 We have not tried running this, so I cannot say for sure how to make this work.

weihunko commented 5 years ago

Thanks for the quick response! The reason I was trying to launch LGSVL simulator insdie docker is because our application also utilizes containers and I saw those two projects:

However, I may have some misunderstanding here. Is simulator actually running outside docker in lanefollowing project and autoware demo?

martins-mozeiko commented 5 years ago

We always run simulator outside of docker. For lanefollowing demo only its python scripts run inside docker. These scripts receive image from simulator over ROS2 and does DNN for lane recognition, and then sends back control command over ROS2 to simulator.

weihunko commented 5 years ago

From reading the script, I assume the ros2_bridge also runs inside docker? and the bridge is able to talk to simulator without changing settings?

martins-mozeiko commented 5 years ago

Yes, ros2 bridge runs inside docker in same ROS2 environment as AD stack. Bridge will listen on TCP 9090 port websocket connection. Simulator connects to this port to talk to ROS2. All the docker container does is exposes port 9090 to host. This way we ware able to talk to any ROS 1 or 2 environment.

weihunko commented 5 years ago

Thanks a lot! That should work with our application. I will give it a try.

david-gwa commented 5 years ago

we actually tried to run the previous version (2019.04) lg sim in Docker, it works. but not sure if still works in the new version with Vulkan

diegoferigo commented 5 years ago

I confirm that the 2019.05 works in docker. With the latest 2019.09 release, I managed to launch from within a docker container the web interface, access it from the outside, and download the assets. Though, due to #366, I couldn't go any further.

david-gwa commented 5 years ago

@diegoferigo

I just tried to migrate from 2019.04 to 2019.09(with vulkan), there is a project docker-nvidia-vulkan, which may be a good try. by default, vulkan is not supported in Docker(18.9 or 19.03).

diegoferigo commented 5 years ago

@david-gwa Thanks a lot for the link! I originally though that vulkan required some work to the nvidia runtime. I will have a look to the Dockerfile, let's see if I can get something running. Unfortunately, as I mentioned before, my GPU is not good enough to run the latest simulator version neither in my host :/

Keep us updated about your progresses!

martins-mozeiko commented 5 years ago

Yeah, vulkan support for nvidia-runtime would be ideal solution, less configuration would be necessary.

That github repository is doing same thing as I suggested above - install nvidia driver inside container (make sure it is exactly same version as on host) and then it should work. Docker image would not be very portable across different host machines, but it would work when built locally.

This was the way how you used to get cuda or opengl support in containers before nvidia-runtime was created. And there are no big differences between how vulkan interacts with kernel driver, so it should work same way.

diegoferigo commented 5 years ago

That github repository is doing same thing as I suggested above - install nvidia driver inside container (make sure it is exactly same version as on host) and then it should work. Docker image would not be very portable across different host machines, but it would work when built locally.

Unless there's something I don't know, why not using the official cuda images as base and just add the vulkan SDK? (disclaimer: I'm not familiar at all with vulkan). Together with the new nvidia support integrated in the 2019.03 docker version, the resulting image would have way less hardcoded components.

martins-mozeiko commented 5 years ago

Vulkan SDK does not matter. That is for SDK. What you want is runtime. But the problem is not runtime, The problem is that nvidia drivers are made in a way that versions of client space libraries you use in your process must match kernel modules versions. This is same for cuda, opengl, and I assume vulkan. The question is - how you make version of libraries inside container match version of kernel modules on host? One way is to install driver inside docker image. That works fine, but is not very portable. That's why nvidia created nvidia-docker runtime, which basically is just a hook at startup of container - it simply copies some .so files from your host into newly created container - thus having exactly same versions of libraries. And your nvidia code works. The problem, no idea why, is that they did not do this for vulkan. Only OpenGL and cuda libraries. That's why you cannot use vulkan with nvidia-docker docker --gpu support does similar thing. I think they standartized the way you do these hooks, so it works not only with nvidia, but also other things (but I don't know much details about new docker). There is nothing really else docker itself is doing to make GPU available inside container. So unless new docker --gpus support added these hooks for vulkan libraries, it won't help.

diegoferigo commented 5 years ago

Thanks for the insight @martins-mozeiko, now it's more clear. Maybe either @renaudwastaken or @flx42 could add some more detail from upstream?

RenaudWasTaken commented 5 years ago

We publish some Vulkan images that in conjunction with the --gpus option will expose some of the libraries inside your container :)

https://hub.docker.com/r/nvidia/vulkan

martins-mozeiko commented 5 years ago

Oh, this looks pretty simple. All you need to do is put vulkan loader & icd files in correct places. Thanks @RenaudWasTaken, this is very useful!

daohu527 commented 4 years ago

@RenaudWasTaken The docker pull command maybe wrong:

docker pull nvidia/vulkan  => docker pull nvidia/vulkan:1.1.121-cuda-10.1-alpha

RenaudWasTaken commented 4 years ago

There isn't a latest tag yet, this is why you can't pull with an empty tag.

We will likely publish it with the beta release.

david-gwa commented 4 years ago

There isn't a latest tag yet, this is why you can't pull with an empty tag.

We will likely publish it with the beta release.

Hey, @daohu527

I actually use this gitlab: nvidia/contianer-images, looks nvidia team has maintain a few version of vulkan images there.

maybe you can find the one works on your GPU

t0ny-peng commented 4 years ago

@mark-gerow-lge Hi there. I tried to use the vulkan image provided by RenaudWasTaken. However the Ubuntu desktop manager crashed. Have you or anyone else given it a try?

martins-mozeiko commented 4 years ago

@left4taco If your whole ubuntu desktop crashed then please check the logs from X11 and/or kernel. It sounds like issue wither with GPU hardware and or nvidia driver.

Btw, we have our official instructions for running Simulator inside Docker in git now: https://github.com/lgsvl/simulator/tree/master/Docker

t0ny-peng commented 4 years ago

@martins-mozeiko Thanks. I didn't know that it's already officially supported!

I just gave it a try. It looks like I need to be root to run it. Though the reason is unknown, it's good enough now!

martins-mozeiko commented 4 years ago

You don't need to be root to use Simulator. It works under any user as long as you have not run it with different user already - as it creates settings files under ~/.config folder. So next time it need to be run with same user.

Check if you have correctly installed docker for your user to avoid using root: https://docs.docker.com/install/linux/linux-postinstall/

lgsvl / simulator

Fail to start LGSVL simulator inside docker container #382