Ruby using all GPU memory

osrf-migration commented 4 years ago

Original report (archived issue) by Steven Gray (Bitbucket: stgray).

Running just the urban circuit practice (installed in a catkin workspace) leads to Ruby allocating almost all GPU memory. I do see that the requirements page was recently updated with 4GB VRAM minimum. Is this expected to grow?

Ran with ign launch -v 4 urban_circuit.ign worldName:=urban_circuit_practice_01 robotName1:=X1 robotConfig1:=X1_SENSOR_CONFIG_3

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | a527ae8c41762a101acbf2474382f82acd977df2:01:00.0 Off |                  N/A |
| N/A   66C    P0    N/A /  N/A |   3839MiB /  4042MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2934      G   /usr/lib/xorg/Xorg                           381MiB |
|    0      8774      G   ...uest-channel-token=14576248302439056819    72MiB |
|    0     24967      G   /usr/bin/ruby                               1473MiB |
|    0     24972      G   /usr/bin/ruby                               1762MiB |
|    0     26106      G   gnome-shell                                  145MiB |
+-----------------------------------------------------------------------------+

Running urban_qual instead led to lower usage:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | a527ae8c41762a101acbf2474382f82acd977df2:01:00.0 Off |                  N/A |
| N/A   56C    P0    N/A /  N/A |   1910MiB /  4042MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2934      G   /usr/lib/xorg/Xorg                           360MiB |
|    0      8774      G   ...uest-channel-token=14576248302439056819    71MiB |
|    0     26106      G   gnome-shell                                  149MiB |
|    0     26704      G   /usr/bin/ruby                                473MiB |
|    0     26709      G   /usr/bin/ruby                                854MiB |
+-----------------------------------------------------------------------------+

‌

osrf-migration commented 4 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).

What do you mean with, Is this expected to grow?

‌

osrf-migration commented 4 years ago

Original comment by Steven Gray (Bitbucket: stgray).

I suppose I mean will the requirements be higher for other competition worlds? Also, is there a chance of reducing this usage?

osrf-migration commented 4 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).

The answer to the first question is: no. The requirements will not be higher for other competition worlds. However, notice that you are referring to competition worlds; which are run in the Cloudsim with higher GPU memory than 4G.

The answer to the second question (if understand correctly) is also: no. One workaround is to test your solution locally, it’s to run the SubT Simulation tool in headless mode by setting headless:=true

Are you using the Docker images or the catkin workspace to rub the SubT Simulation tool?

osrf-migration commented 4 years ago

Original comment by Steven Gray (Bitbucket: stgray).

I’m using the catkin workspace. Headless mode helps a little. I see now there’s only one Ruby instance. Out of curiousity, what is each doing? Is one processing the collisions and the other the visual rendering?

Interestingly, running urban_circuit_practice_01 headless, I see Ruby using 1379MB. As soon as I subscribe to a camera topic, that jumps to 2333MB… Is that because the same code now has to process both the collision and visual geometry?

Same with the ign processes – without headless, I have two of this process, using ~100 and 150% cpu each. Running headless, I now have one ign process using ~300% cpu when viewing a camera feed from the sim. Is that expected?

‌

osrf-migration commented 4 years ago

Original comment by Steven Gray (Bitbucket: stgray).

Anyway, I switched from a 6 core, 12 thread laptop with 4GB Quadro to a 4 core, 8 thread desktop with a 6GB 1060 and the difference is huge. On the laptop, I would see high cpu utilization from xorg and gnome-shell as well; I have to assume that had to do with running out of VRAM, as I don’t see it on the desktop at all. Might want to recommend 6GB as a minimum requirement instead?

osrf-migration commented 4 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).

At the beginning of the year I bought a computer with NVidia GPU with 4GB of RAM just for SubT Virtual Track. I believe there are a lot of people that have done something similar. It would be a pity for this investments to be useless.

Please, keep it working with 4GB of GPU memory. I would consider changes in system requirements to be even more problematic than plain software changes. I would expect there would be a process around changing system requirements which would include as a minimum a poll among participants if the suggested change is ok for everyone, along with an explanation why the upgrade is deemed to be necessary.

Just editing a page in the wiki without telling anyone is not a good way to change system requirements.

osrf-migration commented 4 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).

It has not changed. It still indicates 4GB.

osrf-migration commented 4 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).

It has not changed yet. There is a change from 650 and no minimum memory to 1050 with 4GB of memory:

https://osrf-migration.github.io/subt-gh-pages/#!/osrf/subt/wiki/commits/d01779fa302fbe432caaeea085319032cb4b3920

There is a further change from 1050 to 1050 Ti:

https://osrf-migration.github.io/subt-gh-pages/#!/osrf/subt/wiki/commits/e56010437395d853d3752b322bb0553a0acc76be

And now it has been suggested here that the actual requirements currently are not 4GB but 6GB. What I am saying is that this is not a good way to let competitors know that there is a change in system requirements.

osrf-migration commented 4 years ago

Original comment by Alfredo Bencomo (Bitbucket: bencomo).

The 1050 is just an example/suggestion since it’s now about the same price as the old 650. Will you feel better if I change it back to 650?

osrf-migration commented 4 years ago

Original comment by Zbyněk Winkler (Bitbucket: Zbyněk Winkler (robotika)).

I’d expect the page to describe the minimum system requirements for running the simulation (including the minimum system and gpu memory required). Will it run ok on 650? If someone comes it and says it does not work on 650, will it be fixed? If so, then I’d in deed feel much better if 650 was there. But I don’t think this is about my feelings.

osrf-migration commented 4 years ago

Original comment by Michael Carroll (Bitbucket: Michael Carroll).

To clarify a bit of what you are seeing. The ign process is a ruby wrapper around the core of the simulation environment. In general, you should have one instance when running headless, and two instances when running with the GUI. Once instance is the simulation server, which is where all physics, sensors, and rendering are simulated, while the frontend is just doing the rendering required for the GUI display.

If you want more information on the plugins being run, they are part of the ign-launch framework: https://bitbucket.org/ignitionrobotics/ign-launch/src/default/plugins/

‌

osrf-migration commented 4 years ago

Original comment by Steven Gray (Bitbucket: stgray).

Thanks for the explanation. I see both processes (sim and GUI) using similar amounts of VRAM (~1.5GB) for practice 01 - are they both loading the full (visual + collision) mesh representation? If so, is there a way to use shared memory for that? Headless mode seems to consume about ~2.3GB when loading the visuals to simulate camera feeds; potentially saving 1GB would be huge for people with 4GB cards.

osrf-migration commented 4 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).

The GUI using the GPU to render the user camera information. The sim process uses the GPU to render camera data. They need to be separate in order to run headless, and in many cases the GUI should render the scene in a different manner than the cameras.

osrf-migration commented 4 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).

changed state from "new" to "resolved"

See last comment.

osrf / subt

Ruby using all GPU memory #287