Gazebo won't start in `space_robots` due to issues with graphics driver in `spaceros`

ivanperez-keera commented 8 months ago

The updated MESA driver in Space ROS makes it impossible to launch Gazebo.

The offending lines are:

https://github.com/space-ros/docker/blob/76cd412a7379dd5a5ec338b7b75659afbbf0c9a1/spaceros/Earthfile#L91-L93

For me, this manifested when trying to run the mars rover demo inside the space_robots image, although I suspect that the issue would manifest with anything that runs Gazebo.

Removing those lines makes the resulting space_robots image (which is built on top of space_ros) work correctly.

ivanperez-keera commented 8 months ago

@mkhansenbot has more information on this; he was the one who figured this out and told me what the solution was.

ivanperez-keera commented 8 months ago

@mkhansenbot The fix to this seems relatively simple, but I don't want to rush things to get them in the release unnecessarily. Do you think this should be in the next release?

If so, maybe I can send the PR that takes those lines out and you can review & approve.

mkhansenbot commented 8 months ago

Added to next milestone (proposed) humble-2024.04.0.

I think the immediate issue with the upstream driver has been fixed, at least on my system. However, we're open to future breaks in our simulation due to the fact we're always using the upstream driver. @EzraBrooks has suggested that we can pin to a mesa driver version. I think we should discuss that solution and fix in the next release.

martincerven commented 1 month ago

@mkhansenbot shouldn't graphics drivers be installed in base image? For example I used as base Nvidia image with preinstalled (Nvidia) drivers, and without kisak's mesa apt drivers.

With Nvidia drivers the Gazebo runs super smooth. Also you can see that GPU is used for hardware acellerated rendering.

Before that I tried default base ubuntu:jammy but it didn't use GPU (since it didn't have the correct drivers).

I think also mesa can have conflicts with proprietary drivers.

EzraBrooks commented 1 month ago

hardware-specific drivers shouldn't be installed in base Space ROS images for licensing and dependency conflict issues (i.e. NVIDIA drivers often don't work properly with real-time kernels)

martincerven commented 1 month ago

By base image I meant FROM ubuntu:jammy, let's say I want to launch CubeSat with machine learning applications, how else should I accomplish this but by using off the shelf (proprietary) devices? I found they support real-time kernels, did you mean this?

Also, how would you use Gazebo containers without GPU acceleration?

EzraBrooks commented 1 month ago

Gazebo uses GLX (OpenGL rendered via X11). Generally, you would forward your X11 display into the container by setting your $DISPLAY and your $XAUTHORITY correctly, which would allow Gazebo to make its GLX draw calls to the X11 shell running on your host - thereby bypassing any need to do the OpenGL draw calls inside the container.

These arguments here represent (most of) the common solution to that particular problem.

https://github.com/space-ros/docker/blob/ad020b80406974bb494f04a6a3f8216273a5489e/moveit2/run.sh#L15-L16

another example using docker-compose (which is usually what I do):

https://github.com/aws-deepracer-community/deepracer-for-cloud/blob/d236c1a9466dea6ac6065e594b86644652468779/docker/docker-compose-local-xorg.yml#L5-L13

In my experience, having Mesa (device- and GL-agnostic drivers) inside the container and using your host's X11 display works with any manufacturer's hardware.

martincerven commented 1 month ago

That's true, but If you want to use any GPU inference inside docker, then you need to use CUDA enabled containers.

And you get that for free using other base image than ubuntu:jammy.

Of course you can only use it if you own such a device, and then you probably don't care that it has proprietary Nvidia stuff in it. AMD has similar runtime, and Intel seems to have too.

Although I haven't used these so I maybe they are not for Edge but for ML in datacenters.

I just wanted to say that If you use Nvidia, AMD, Intel, or Raspberry Pi 5 drivers (OpenGL, Vulkan, CUDA, ROCm) then it might be better to install it in base image (as it might not use full capabilities of your HW)

chancecardona commented 1 month ago

In regards to the GPU inference and other comments, isn't the best practice approach to rather use Nvidia Container Toolkit and just have the appropriate drivers installed locally? By the way, I have found micromamba docker images (https://github.com/mamba-org/micromamba-docker) to work better for CUDA based inference in cases where it's called for (which imo we would just use as a final image base layer which gets the assets copied to it instead of modifying the base image that should work for everything).

EzraBrooks commented 1 month ago

Maybe it would be worth opening a Discussion about GPU acceleration in Space ROS. Fixing Mesa (the original topic of this issue) is not quite related to the problem of vendor-specific hardware APIs

space-ros / docker

Gazebo won't start in `space_robots` due to issues with graphics driver in `spaceros` #100