Support property specifying GPU to bind to.

ghost commented 4 years ago

Hello,

My PC just installed another GPU for the Azure Kinect. I wonder if I need to change some parameters of this ROS package to tell the SDK which GPU it should use?

Thank you.

RoseFlunder commented 4 years ago

As far as I know you can only specify which GPU will be used for the Body Tracking SDK. To do this you will need to change the code in this line because the node itself doesn't provide a parameter for this yet: https://github.com/microsoft/Azure_Kinect_ROS_Driver/blob/melodic/src/k4a_ros_device.cpp#L305

The create() method takes an optional configuration that you can provide. In this configuration you can specify the GPU device ID which is just the integer you see in your nvidia smi output in front of the GPUs name.

So the code should like this if you want to use the RTX 2080 (id = 1) for example:

k4abt_tracker_configuration_t k4abt_config = K4ABT_TRACKER_CONFIG_DEFAULT;
k4abt_config.gpu_device_id = 1;
k4abt_tracker_ = k4abt::tracker::create(calibration_data_.k4a_calibration_, k4abt_config);

ooeygui commented 4 years ago

Thanks @RoseFlunder for answering on this. I would appreciate a pull request on this topic - I'm unable to test multi-gpu scenarios at the moment.

ghost commented 4 years ago

I just tested it with the following code as suggested by RoseFlunder:

k4abt_tracker_configuration_t k4abt_config = K4ABT_TRACKER_CONFIG_DEFAULT;
k4abt_config.gpu_device_id = 0;
k4abt_tracker_ = k4abt::tracker::create(calibration_data_.k4a_calibration_,k4abt_config);
k4abt_tracker_.set_temporal_smoothing(params_.body_tracking_smoothing_factor);

I specifically put 0(default) for testing based on the discussion here. However, I run into the following errors:

[2020-02-06 11:38:02.591] [error] [t=8928] [K4ABT] /home/vsts/work/1/s/src/TrackerHost/TrackerHost.cpp (132): Create(). The current depth mode is not supported by the K4ABT SDK!
[2020-02-06 11:38:02.591] [error] [t=8928] [K4ABT] /home/vsts/work/1/s/src/sdk/k4abt.cpp (38): tracker->Create(sensor_calibration, config) returned failure in k4abt_tracker_create()
terminate called after throwing an instance of 'k4a::error'
  what():  Failed to create k4abt tracker!
[azure_kinect_ros_driver-3] process has died [pid 8928, exit code -6, cmd /home/ywen/abc/devel/lib/azure_kinect_ros_driver/node __name:=azure_kinect_ros_driver __log:=/home/ywen/.ros/log/90908548-4904-11ea-867c-c400ad0f67df/azure_kinect_ros_driver-3.log].
log file: /home/ywen/.ros/log/90908548-4904-11ea-867c-c400ad0f67df/azure_kinect_ros_driver-3*.log

I'm curious why the selection of GPU would cause an error (even when I use the default setting). I used the driver.launch with the depth enabled and I have tried all the four depthmode (NFOV and WFOV_) and they all return the same error as listed above.

When I run apt list --installed | grep k4a, I have the following results:

k4a-tools/bionic,now 1.3.0 amd64 [installed]
libk4a1.3/bionic,now 1.3.0 amd64 [installed,automatic]
libk4a1.3-dev/bionic,now 1.3.0 amd64 [installed]
libk4abt1.0/bionic,now 1.0.0 amd64 [installed,automatic]
libk4abt1.0-dev/bionic,now 1.0.0 amd64 [installed]

ooeygui commented 4 years ago

I'm asking the Azure Kinect SDK team to take a look at this. (I'm on the Microsoft Edge Robotics team who maintains the ROS connector; but not the SDK itself)

In the mean time, I suspect that you need to change the ROS parameters for the depth in order for Body Tracking to work.

RoseFlunder commented 4 years ago

Did you change it exactly at the line I linked? Just asking because maybe you pasted the code before the calibration_data_.k4a_calibration_ is correctly initialized.

I have only a single GPU and can confirm that is working when using device id 0 (my only valid device id).

k4abt_tracker_configuration_t k4abt_config = K4ABT_TRACKER_CONFIG_DEFAULT;
k4abt_config.gpu_device_id = 0;
k4abt_tracker_ = k4abt::tracker::create(calibration_data_.k4a_calibration_,k4abt_config);
k4abt_tracker_.set_temporal_smoothing(params_.body_tracking_smoothing_factor);

If I set a wrong device id like 1, I will get an expected Cuda error: CUDA failure 10: invalid device ordinal

Edit: @ooeygui I could add a body_tracking_gpu_device_id parameter to the node and make a pull request, but I don't have a second GPU to test it. I can only confirm thats it will work with device id 0 and that an error appears when using an invalid id.

ghost commented 4 years ago

I believe I have put them into the correct position. However, I could be wrong (my major is mechanical). I have uploaded the k4a_ros_device.cpp file here

Starting from line 301, I have the following:

#if defined(K4A_BODY_TRACKING)
  // When calibration is initialized the body tracker can be created with the device calibration
  if (params_.body_tracking_enabled)
  {
    k4abt_tracker_configuration_t k4abt_config = K4ABT_TRACKER_CONFIG_DEFAULT;
    k4abt_config.gpu_device_id = 0;
    k4abt_tracker_ = k4abt::tracker::create(calibration_data_.k4a_calibration_,k4abt_config);
    k4abt_tracker_.set_temporal_smoothing(params_.body_tracking_smoothing_factor);
  }
#endif

Also, I have tried different ros parameters for depth_mode and they all give the same error.

RoseFlunder commented 4 years ago

Looks good to me. Sorry, I can't reproduce the error. Does the error still happen if comment out this line? k4abt_config.gpu_device_id = 0; Because then it would just be the default config which is also used if don't pass a config to the create method of the tracker.

ghost commented 4 years ago

I just unplugged the camera from my PC and then re plugged it back. The problem seems to be solved. However I run into another problem. Here is my current graphics card:

| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P1000        Off  | 00000000:03:00.0  On |                  N/A |
| 61%   63C    P0    N/A /  N/A |    784MiB /  4031MiB |      4%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:04:00.0 Off |                  N/A |
|  0%   54C    P2    90W / 250W |   1524MiB / 11019MiB |     10%      Default |
+-------------------------------+----------------------+----------------------+

When I set the gpu_device_id = 0, the body tracking result seems to be very responsive and the nvidia-smi shows the following:

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1405      G   /usr/lib/xorg/Xorg                            65MiB |
|    0      1515      G   /usr/bin/gnome-shell                          53MiB |
|    0      1859      G   /usr/lib/xorg/Xorg                           432MiB |
|    0      2004      G   /usr/bin/gnome-shell                         197MiB |
|    0      3800      G   /usr/lib/firefox/firefox                       1MiB |
|    0      5167      G   rviz                                           6MiB |
|    0     18563    C+G   .../devel/lib/azure_kinect_ros_driver/node    14MiB |
|    1     18563      C   .../devel/lib/azure_kinect_ros_driver/node  1513MiB |
+-----------------------------------------------------------------------------+

However, when I set the gpu_device_id = 1, the body tracking result seems to be much slower than before and the nvidia-smi shows the following:

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1405      G   /usr/lib/xorg/Xorg                            65MiB |
|    0      1515      G   /usr/bin/gnome-shell                          53MiB |
|    0      1859      G   /usr/lib/xorg/Xorg                           432MiB |
|    0      2004      G   /usr/bin/gnome-shell                         197MiB |
|    0      3800      G   /usr/lib/firefox/firefox                       1MiB |
|    0      5167      G   rviz                                           6MiB |
|    0     18258    C+G   .../devel/lib/azure_kinect_ros_driver/node   673MiB |
+-----------------------------------------------------------------------------+

For some reasons, the device_id cannot set the graphics card properly. I wonder if you may help to investigate what's going on? For now, I can work on my application. But the others may run into similar problems.

I have also uploaded my working directory here for your reference.

RoseFlunder commented 4 years ago

Hmm I can't really test anything because I only have a single GPU, but from the looks of it, it seems like GPU device ID 0 in the SDK may use the GPU the nvidia-smi output lists as 1 and SDK GPU device ID 1 may be the one nvidia-smi lists as 0. This is my guess because if you set it to 0, nvidia-smi prints out that GPU 1 is also used from the node.

EDIT: You can test on which GPU the body tracking is running by observing nvidia-smi output and check which GPU usage is higher from the both (the GPU-Util column).

EDIT2: You can observe nvidia-smi output every second by calling: nvidia-smi -l 1

EDIT3: To improve general performance of the node compile with: catkin_make -DCMAKE_BUILD_TYPE=Release

microsoft / Azure_Kinect_ROS_Driver

Support property specifying GPU to bind to. #115