Fix nvidia runtime fail on arm64 & add --group-add plugin

amadeuszsz commented 1 year ago

It is not possible to run container on Nvidia Jetson Xavier NX (Jetpack 5.0.2, Docker 20.10.21)

Executing command: docker run --rm -it --gpus all -v /home/apm/autoware:/home/apm/autoware -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.dockerzb1vil5u.xauth -v /tmp/.dockerzb1vil5u.xauth:/tmp/.dockerzb1vil5u.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro 7812f082195f docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'csv' invoking the NVIDIA Container Runtime Hook directly (e.g. specifying the docker --gpus flag) is not supported. Please use the NVIDIA Container Runtime instead.: unknown.

Tiny change in nvidia_extension.py fixes the problem. However, execution of any window app (e.g. rviz2) causes Segmentation fault error. Adding user to video group fixes the problem.

With these changes I ran cuda test inside the docker with different rocker settings:

none or --group-add video only

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

--nvidia only

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

NvRmMemInitNvmap failed with Permission denied
549: Memory Manager Not supported

****NvRmMemInit failed**** error type: 196626

*** NvRmMemInit failed NvRmMemConstructor
cudaGetDeviceCount returned 801
-> operation not supported
Result = FAIL

--nvidia and --group-add video

./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Xavier"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    7.2
  Total amount of global memory:                 6847 MBytes (7179161600 bytes)
  (006) Multiprocessors, (064) CUDA Cores/MP:    384 CUDA Cores
  GPU Max Clock rate:                            1109 MHz (1.11 GHz)
  Memory Clock rate:                             1109 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        98304 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS

I'm not sure if gpu fix is the best idea, but what about group-add plugin in general? Is there any other way to reach that functionality? As I remember, several times my sensor needed group permission (e.g. dialout), thus I had to do it manually in terminal and attach container to new bash terminal again to refresh groups.

tfoote commented 1 year ago

Can you see if #212 resolves your issue? Probably separate from the GPU option? It seems to provide a superset of what you need.

amadeuszsz commented 1 year ago

Noted same issue with Jetson Orin AGX but this PR fixes the problem.

tfoote commented 1 year ago

Great, I'll take that one as it's more generic. Could you then rebase this and turn it into a PR to specifically address the ability to force the version of the nvidia flags?

amadeuszsz commented 1 year ago

Great, I'll take that one as it's more generic. Could you then rebase this and turn it into a PR to specifically address the ability to force the version of the nvidia flags?

Done - removed old unnecessary commit for clear history purpose and rebased. Is that what you meant or did you want two separated PRs for group-add functionality and forcing nvidia flag?

tfoote commented 1 year ago

I wanted to replace the --group-add functionality with --user-preserve-groups from #212 and then this would just be forcing the nvidia flag

amadeuszsz commented 1 year ago

I wanted to replace the --group-add functionality with --user-preserve-groups from https://github.com/osrf/rocker/pull/212 and then this would just be forcing the nvidia flag

Sorry for misunderstanding, but by

Noted same issue with Jetson Orin AGX but this PR fixes the problem.

I meant changes in #211. --user-preserve-groups doesn't fix my problem - it adds few groups from host but not the desired one (video). Another example: connected IMU with vendor's udev rules required dialout group. group-add dialout works perfect. --user-preserve-groups didn't pass dialout, thus I need add group in docker by hand and then enter into container in new terminal window (otherwise groups won't refresh) and only then run sensor nodes. Would you approve of these 2 PRs?

tfoote commented 1 year ago

Oh, sorry I thought that you'd said that #212 fixed your solution too. So in addition to preserving the groups. If you mount a device into the container it my get mounted with certain access that's required. Or provide permissions that you don't have on the host. (ala sudo)

We should figure out how to make sure that this and #212 can work together then. So they don't conflict depending on ordering. And secondly there's actually already a force of sudo in the users dir which should be evaluated for consistency.

amadeuszsz commented 1 year ago

user-preserve-groups adds only groups dynamically allocated (GID > 99), therefore doc seems not true. I've merged #211 and #212 and exploited in many possible ways - different order or overriding groups from both solutions and I didn't see any conflict.

tfoote commented 1 year ago

I've gone ahead and merged the group elements in #222 could you rebase and validate the nvidia changes here?

amadeuszsz commented 1 year ago

I've gone ahead and merged the group elements in #222 could you rebase and validate the nvidia changes here?

Changed and tested, everything works. @tfoote , FYI idk why but current --user-preserve-groups doesn't work as expected. I took main branch without any changes and there is a bug:

Without --user-preserve-groups & without --user -> OK.

Without --user-preserve-groups & with --user -> ERROR.

Traceback (most recent call last):
File "/home/amadeusz/.local/bin/rocker", line 8, in <module>
sys.exit(main())
File "/home/amadeusz/.local/lib/python3.8/site-packages/rocker/cli.py", line 64, in main
dig = DockerImageGenerator(active_extensions, args_dict, base_image)
File "/home/amadeusz/.local/lib/python3.8/site-packages/rocker/core.py", line 209, in __init__
self.dockerfile = generate_dockerfile(active_extensions, self.cliargs, base_image)
File "/home/amadeusz/.local/lib/python3.8/site-packages/rocker/core.py", line 348, in generate_dockerfile
dockerfile_str += el.get_snippet(args_dict) + '\n'
File "/home/amadeusz/.local/lib/python3.8/site-packages/rocker/extensions.py", line 279, in get_snippet
matched_groups = [g for g in all_groups if g.gr_name in cliargs['user_preserve_groups']]
File "/home/amadeusz/.local/lib/python3.8/site-packages/rocker/extensions.py", line 279, in <listcomp>
matched_groups = [g for g in all_groups if g.gr_name in cliargs['user_preserve_groups']]
TypeError: argument of type 'bool' is not iterable

With --user-preserve-groups & without --user -> OK, but groups are not shared, is it expected?
With --user-preserve-groups & with --user -> OK.

It is not connected with my changes, just saying.

tfoote commented 1 year ago

Thanks for flagging that issue with the merged one. This looks good to go on top, but it got caught with the dependency issues in #225. Could you rebase once more and I think we should be good to merge?

amadeuszsz commented 1 year ago

Could you rebase once more and I think we should be good to merge?

Done.

amadeuszsz commented 1 year ago

It should be pretty straight forward to add a few more cases with extra mocked cliargs like here

Done.

osrf / rocker

Fix nvidia runtime fail on arm64 & add --group-add plugin #211